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Graduate-only lectures denoted with '*' 



What Is An Algorithm? 

Algorithms are the ideas behind computer programs. 

An algorithm is the thing which stays the same whether 
the program is in Pascal running on a Cray in New York 
or is in BASIC running on a Macintosh in Kathmandu! 

To be interesting, an algorithm has to solve a general, 
specified problem. An algorithmic problem is specified 
by describing the set of instances it must work on and 
what desired properties the output must have. 



Example: Sorting 

Input: A sequence of N numbers ai...an 

Output: the permutation (reordering) of the input se- 
quence such as ai < a2 . . . < an- 

We seek algorithms which are correct and efHcient. 



Correctness 

For any algorithm, we must prove that it always returns 
the desired output for all legal instances of the problem. 



For sorting, this means even if (1) the input is already 
sorted, or (2) it contains repeated elements. 



Correctness is Not Obvious! 

The following problem arises often in manufacturing 
and transportation testing applications. 

Suppose you have a robot arm equipped with a tool, 
say a soldering iron. To enable the robot arm to do 
a soldering job, we must construct an ordering of the 
contact points, so the robot visits (and solders) the 
first contact point, then visits the second point, third, 
and so forth until the job is done. 

Since robots are expensive, we need to find the order 
which minimizes the time (ie. travel distance) it takes 
to assemble the circuit board. 




You are given the job to program the robot arm. Give 
me an algorithm to find the best tour! 



Nearest Neighbor Tour 

A very popular solution starts at some point po and then 
walks to its nearest neighbor pi first, then repeats fronn 
PI, etc. until done. 

Pick and visit an initial point po 

P = PO 

i = 

While there are still unvisited points 

i = i-\-l 

Let Pi be the closest unvisited point to pi-i 

Visit Pi 
Return to po from pi 



This algorithm is simple to understand and implement 
and very efficient. However, it is not correct! 
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Always starting from the leftmost point or any other 
point will not fix the problem. 



Closest Pair Tour 

Always walking to the closest point is too restrictive, 
since that point might trap us into making moves we 
don't want. 



Another idea would be to repeatedly connect the clos- 
est pair of points whose connection will not cause a 
cycle or a three-way branch to be formed, until we 
have a single chain with all the points in it. 

Let n be the number of points in the set 

d ^ oo 

For z = 1 to n — 1 do 

For each pair of endpoints {x,y) of partial paths 
If dist{x,y) < d then 

Xm = X, y^ = y, d = dist(x, y) 

Connect {xm,ym) by an edge 
Connect the two endpoints by an edge. 

Although it works correctly on the previous example, 
other data causes trouble: 





This algorithm is not correct! 



A Correct Algorithm 

We could try all possible orderings of the points, then 
select the ordering which nnininnizes the total length: 

d ^ oo 

For each of the n! pernnutations Hi of the n points 
If (costiUi) < d) then 

d = cost( rii) and Pmin = Hi 
Return Pmin 



Since all possible orderings are considered, we are guar- 
anteed to end up with the shortest possible tour. 

Because it trys all n\ pernnutations, it is extremely slow, 
nnuch too slow to use when there are more than 10-20 
points. 

No efficient, correct algorithm exists for the traveling 
salesman problem, as we will see later. 



Efficiency 

"Why not Just use a supercomputer?" 

Supercomputers are for people too rich and too stupid 
to design efficient algorithnns! 

A faster algorithnn running on a slower connputer will 
always win for sufficiently large instances, as we shall 
see. 

Usually, problenns don't have to get that large before 
the faster algorithnn wins. 



Expressing Algorithms 

We need some way to express the sequence of steps 
comprising an algorithm. 

In order of increasing precision, we have English, pseu- 
docode, and real programming languages. Unfortu- 
nately, ease of expression moves in the reverse order. 

I prefer to describe the ideas of an algorithm in English, 
moving to pseudocode to clarify sufficiently tricky de- 
tails of the algorithm. 



The RAM Model 

Algorithms are the only important, durable, and origi- 
nal part of computer science because they can be stud- 
ied in a machine and language independent way. 

The reason is that we will do all our design and analysis 
for the RAM model of computation: 

• Each "simple" operation (-j-, -, =, if, call) takes 
exactly 1 step. 

• Loops and subroutine calls are not simple opera- 
tions, but depend upon the size of the data and 
the contents of a subroutine. We do not want 
"sort" to be a single step operation. 

• Each memory access takes exactly 1 step. 

We measure the run time of an algorithm by counting 
the number of steps. 

This model is useful and accurate in the same sense as 
the flat-earth model (which is useful)! 



Best, Worst, and Average-Case 

The worst case complexity of the algorithm is the func- 
tion defined by the nnaxinnunn nunnber of steps taken 
on any instance of size n. 



#of 
Steps 




Worst Case 
Complexity 



Average Case 
Complexity 



Best Case 
Complexity 



The best case complexity of the algorithm is the func- 
tion defined by the nninimum number of steps taken on 
any instance of size n. 

The average-case complexity of the algorithm is the 
function defined by an average number of steps taken 
on any instance of size n. 



Each of these complexities defines a numerical function 
- time vs. size! 



Insertion Sort 

One way to sort an array of n elennents is to start with 
an ennpty list, then successively insert new elements in 
the proper position: 



ai < a2 ^ . . . ^ ttfc I otfc+i • • • f^n 

At each stage, the inserted element leaves a sorted 
list, and after n insertions contains exactly the right 
elements. Thus the algorithm must be correct. 

But how efficient is it? 

Note that the run time changes with the permutation 
instance! (even for a fixed size problem) 

How does insertion sort do on sorted permutations? 

How about unsorted permutations? 



Exact Analysis of Insertion Sort 

Count the number of tinnes each line of pseudocode 
will be executed. 



Line 


InsertionSort(A) 


#Inst. 


#Exec. 


1 


for j: 2 to len. of A do 


cl 


n 


2 


key: Aj 


c2 


n-1 


3 


/* put A[j] into A[l..j-1] */ 


c3-0 


/ 


4 


i: J-1 


c4 


n-1 


5 


while i > 0&:A[1] > key do 


c5 


tj 


6 


A[i+1]:= A[i] 


c6 




7 


i : i-1 


c7 




8 


A[i+l]: = key 


c8 


n-1 



The for statennent is executed (n— 1)-|-1 tinnes (why?) 

Within the for statennent, "key:=A|j]" is executed n-1 
tinnes. 



Steps 5, 6, 7 are harder to count. 

Let tj = l-j- the nunnber of elennents that have to be 
slide right to insert the jth item. 

Step 5 is executed t2 -\-t3 -\- ... -j- t„ times. 

Step 6 is t2-i + ts-i -h ... -h tn-i. 



Add up the executed instructions for all pseudocode 
lines to get the run-tinne of the algorithnn: 

ci*n + C2(n-l) + C4(n-l)+ C5 ^J^2 *j+ <^6 I]J=2(*J ~ ^) 
What are the t'^s? They depend on the particular input. 



Best Case 

If it's already sorted, all tj's are 1. 

Hence, the best case tinne is 

cin + (c2 + C4 + C5 + cs){n — 1) = Cn + D 
where C and D are constants. 



Worst Case 

If the input is sorted in descending order, we will have 
to slide all of the already-sorted elements, so tj = j, 
and step 5 is executed 



n 



J2j = in' + n)/2-l 

i=2 



How can we modify almost any algorithm to have a 
good best-case running time? 



To improve the best case, all we have to do it to be 
able to solve one instance of each size efficiently. We 
could nnodify our algorithm to first test whether the 
input is the special instance we know how to solve, 
and then output the canned answer. 

For sorting, we can check if the values are already or- 
dered, and if so output them. For the traveling sales- 
man, we can check if the points lie on a line, and if so 
output the points in that order. 

The supercomputer people pull this trick on the Unpack 
benchmarks! 



Because it is so easy to cheat with the best case run- 
ning time, we usually don't rely too much about it. 

Because it is usually very hard to compute the average 
running time, since we must somehow average over all 
the instances, we usually strive to analyze the worst 
case running time. 

The worst case is usually fairly easy to analyze and 
often close to the average or real running time. 



Exact Analysis is Hard! 

We have agreed that the best, worst, and average case 
complexity of an algorithm is a numerical function of 
the size of the instances. 




However, it is difficult to work with exactly because it 
is typically very complicated! 

Thus it is usually cleaner and easier to talk about upper 
and lower bounds of the function. 



This is where the dreaded big O notation comes in! 

Since running our algorithm on a machine which is 
twice as fast will effect the running times by a multi- 
plicative constant of 2 - we are going to have to ignore 
constant factors anyway. 



Names of Bounding Functions 

Now that we have clearly defined the complexity func- 
tions we are talking about, we can talk about upper 
and lower bounds on it: 



• g(n) = 0{f{n)) means C x /(n) is an upper bound 
on g{n). 

• g(n) = Q(/(n)) means C x f{n) is a lower bound 
on flf(n). 

• g(n) = 0(/(n)) means Cix/(n) is an upper bound 
on g{n) and C2 x /(n) is a lower bound on g{n). 

Got it? C, C\, and C2 are all constants independent of 
n. 

All of these definitions imply a constant no beyond 
which they are satisfied. We do not care about small 
values of n. 



O, n, and e 



c2g<n) 




clg(n) 



cg(n) 



(a) 





nO 



(c) 



The value of no shown is the minimum possible value; 
any greater value would also work. 

(a) /(n) = 0{g{n)) if there exist positive constants no, 
ci, and C2 such that to the right of no, the value of 
/(n) always lies between ci-g{n) and c2-g{n) inclusive. 

(b) /(n) = 0{g{ri)) if there are positive constants no 
and c such that to the right of no, the value of /(n) 
always lies on or below C'g{n). 

(c) /(n) = Q(5f(n)) if there are positive constants no 
and c such that to the right of no, the value of /(n) 
always lies on or above C'g{n). 

Asymptotic notation (O, ©,Q) are as well as we can 
practically deal with complexity functions. 



What does all this mean? 



3n^ - lOOn + 6 = O(n^) because 3n^ > 3n^ - lOOn + 6 
3n^ - lOOn + 6 = O(n^) because .Olrr' > 3n^ - lOOn + 6 
3n — lOOn + 6 7^ 0{n) because cn<. 3n when n y c 



3n^ - lOOn + 6 = ^(n^) because 2.99n^ < 3n^ - lOOn + 6 
3n^ - lOOn + 6 7^ ^{n^) because 3t? - lOOn + 6 < n^ 
3r? - lOOn + 6 = Q(n) ftecaitse 10^°'°n < 3n^ - 100 + 6 



3r? — lOOn + 6 = 0(n^) because O and Q. 
3n — lOOn + 6 ^ ©(n ) because O only 
3n — lOOn + 6 ^ ©(n) because Q. only 



Think of the equality as meaning in the set of functions. 

Note that tinne connpiexity is every bit as well defined 
a function as sin(a;) or you bank account as a function 
of time. 



Testing Dominance 

/(n) dominates g{n) if Wmn^oo g(,n) / f (^n) = 0, which is 
the same as saying g{n) = o{f{n)). 

Note the little-oh - it means "grows strictly slower 
than". 

Knowing the dominance relation between common func- 
tions is important because we want algorithms whose 
time complexity is as low as possible in the hierarchy. 
If f{n) dominates g{n), f is much larger (ie. slower) 
than g. 



n" dominates n^ \f a> b since 

Mm nVn" = n^"" 







Tl —>00 



n" + o(n") doesn't dominate n" since 

Mm n"/(n" + o(n")) -^ 1 

Tl —>00 



Complexity 



UJ 



"^tJ" 



"3rJ" 



"4tr 



n 

on 

3" 



0.00001 sec 
0.0001 sec 
0.001 sec 
0.1 sec 
0.001 sec 
0.59 sec 



0.00002 sec 
0.0004 sec 
0.008 sec 
3.2 sec 
1.0 sec 
58 min 



0.00003 sec 
0.0009 sec 
0.027 sec 
24.3 sec 
17.9 min 
6.5 years 



0.00004 sec 
0.016 sec 
0.064 sec 
1.7 min 
12.7 days 
3855 cent 



Logarithms 



It is important to understand deep in your bones what 
logarithms are and where they come from. 

A logarithm is simply an inverse exponential function. 
Saying b"" = y \s equivalent to saying that x = log^y. 

Exponential functions, like the amount owed on a n 
year mortgage at an interest rate of c% per year, are 
functions which grow distressingly fast, as anyone who 
has tried to pay off a mortgage knows. 

Thus inverse exponential functions, ie. logarithms, 
grow refreshingly slowly. 

Binary search is an example of an O(lgn) algorithm. 
After each comparison, we can throw away half the 
possible number of keys. Thus twenty comparisons 
suffice to find any name in the million-name Manhattan 
phone book! 

If you have an algorithm which runs in O(lgn) time, 
take it, because this is blindingly fast even on very 
large instances. 



Properties of Logarithms 

Recall the definition, c'°9='' = x. 



Asymptotically, the base of the log 

does not matter: 



log^a = 

log^fo 

Thus, 1092 n = (1/Iogioo2) x 109100?^, and note that 
1/Io9ioo2 = 6.643 is just a constant. 



Asymptotically, any polynomial 
function of n does not matter: 

Note that 

logCn"^^^ + n^ + n + 96) = 0(log n) 
since n^^s _^ ^2 _^ ^ _^ 95 _ o{ri^'^^), and logn^^s = 
473 * log n. 

Any exponential donninates every polynonnial. This 
is why we will seek to avoid exponential tinne algo- 
rithms. 



Federal Sentencing Guidelines 

2F1.1. Fraud and Deceit; Forgery; Offenses Involving 
Altered or Counterfeit Instruments other than Coun- 
terfeit Bearer Obligations of the United States. 

(a) Base offense Level: 6 

(b) Specific offense Characteristics 

(1) If the loss exceeded $2,000, increase the offense 
level as follows: 



Loss(Apply the Greatest) 


Increase in Level 


(A) $2,000 or less 


no increase 


(B) More than $2,000 


add 1 


(C) More than $5,000 


add 2 


(D) More than $10,000 


add 3 


(E) More than $20,000 


add 4 


(F) More than $40,000 


add 5 


(G) More than $70,000 


add 6 


(H) More than $120,000 


add 7 


(I) More than $200,000 


add 8 


(J) More than $350,000 


add 9 


(K) More than $500,000 


add 10 


(L) More than $800,000 


add 11 


(M) More than $1,500,000 


add 12 


(N) More than $2,500,000 


add 13 


(O) More than $5,000,000 


add 14 


(P) More than $10,000,000 


add 15 


(Q) More than $20,000,000 


add 16 


(R) More than $40,000,000 


add 17 


(Q) More than $80,000,000 


add 18 



The federal sentencing guidelines are designed to help 
judges be consistent in assigning punishnnent. The 
tinne-to-serve is a roughly linear function of the total 
level. 

However, notice that the increase in level as a function 
of the annount of money you steal grows logarithmically 
in the annount of money stolen. 

This very slow growth means it pays to commit one 
crime stealing a lot of money, rather than many small 
crimes adding up to the same amount of money, be- 
cause the time to serve if you get caught is much less. 

The Moral: "If you are gonna do the crime, make It 
worth the time!" 



Working with tfie Asymptotic 

Notation 

Suppose f{n) = 0{ri^) and g{n) = O(n^). 

What do we know about g'(,n) = f{n) + g{n)? Adding 
the bounding constants shows g'in) = Oi'n?). 

What do we know about g"{n) = f(ji) — g{n)? Since 
the bounding constants don't necessary cancel, g"{n) = 

We know nothing about the lower bounds on g' + g" 
because we know nothing about lower bounds on /, g. 



Suppose /(n) = Q(n^) and g{ri) = Q(n^). 

What do we know about g'{n) = f{n) + g{n)7 Adding 
the lower bounding constants shows g'{n) = Q(n^). 

What do we know about g"{n) = f{n) — g{n)? We 
know nothing about the lower bound of this! 



The Complexity of Songs 

Suppose we want to sing a song which lasts for n units 
of time. Since n can be large, we want to nnennorize 
songs which require only a snnall annount of brain space, 
i.e. memory. 

Let S{n) be the space complexity of a song which lasts 
for n units of time. 

The amount of space we need to store a song can 
be measured in either the words or characters needed 
to memorize it. Note that the number of characters 
is S{words) since every word in a song is at most 34 
letters long - Supercalifragilisticexpialidocious! 

What bounds can we establish on S{n)? 

• s{n) = 0{n), since in the worst case we must ex- 
plicitly memorize every word we sing - "The Star- 
Spangled Banner" 

• 5(n) = ^(1), since we must know something about 
our song to sing it. 



The Refrain 

Most popular songs have a refrain, which is a block of 
text which gets repeated after each stanza in the song: 



Bye, bye Miss American pie 

Drove nny chevy to the levy but the levy was 

dry 
Them good old boys were drinking whiskey 

and rye 
Singing this will be the day that I die. 

Refrains made a song easier to remember, since you 
memorize it once yet sing it 0(n) times. But do they 
reduce the space complexity? 

Not according to the big oh. If 

n = repetitions x (verse-size + refrain-size) 

Then the space complexity is still 0{n) since it is only 
halved (if the verse-size = refrain-size): 

S{n) = repetitions x verse-size + refrain-size 



The k Days of Christmas 

To reduce 5(n), we must structure the song differently. 

Consider "The k Days of Christnnas". All one must 
memorize is: 

On the fcth Day of Christmas, my true love 
gave to me, giftk 

On the First Day of Christmas, my true love 
gave to me, a partridge in a pear tree 

But the time it takes to sing it is 

k 
Y^i = k{k-\- l)/2 = 0(/c2) 

If n = O(k^), then k = 0(1/^), so S(n) = O(V^). 



100 Bottles of Beer 

What do kids sing on really long car trips? 

n bottles of beer on the wall, 

n bottles of beer. 

You take one down and pass it around 

n — 1 bottles of beer on the ball. 

All you nnust remember in this song is this template 
of size 0(1), and the current value of n. The storage 
size for n depends on its value, but log2n bits suffice. 

This for this song, S{n) = O(lgn). 



Is there a song which eliminates even the need to 
count? 

That's the way, uh-huh, uh-huh 
I like it, uh-huh, huh 

Reference: D. Knuth, 'The Complexity of Songs', Comm. 
ACM, April 1984, pp. 18-24 



Show that for any real constants a and b, b > 0, 

(n + a)^ = 0(n^) 



To show /(n) = ®{g{n)), we must show O and Q. Go 
back to the definition! 

• Big O - Must show that (n + a)^ < ci - n^ for all 
n > no- When is this true? If ci = 2^, this is true 
for all n > \a\ since n-\-a < 2n, and raise both sides 
to the b. 

• Big Q - Must show that (n + a)^ > C2 • n^ for all 
n > no. When is this true? If C2 = (1/2)^, this is 
true for all n > 3|a|/2 since n-\- a> n/2, and raise 
both sides to the b. 

Note the need for absolute values. 



Modeling 



Modeling is the art of formulating your application in 
ternns of precisely described, well-understood problenns. 
Proper modeling is the key to applying algorithmic de- 
sign techniques to any real-world problem. 

Real-world applications involve real-world objects. 

Most algorithms, however, are designed to work on 
rigorously defined abstract structures such as permu- 
tations, graphs, and sets. 

You must first describe your problem abstractly, in 
terms of fundamental structures and properties. 



Combinatorial Objects 

Permutations, are arrangements, or orderings, of 
itenns. For exannple, {1,4,3,2} and {4,3,2,1} are 
two distinct pernnutations of the sanne set of four 
integers. Permutations are likely the object in 
question whenever your problem seeks an "arrange- 
ment," "tour," "ordering," , or "sequence." 

Subsets, which represent selections from a set of 
items. For example, {1,3,4} and {2} are two dis- 
tinct subsets of the first four integers. Order does 
not matter in subsets the way it does with permu- 
tations, so the subsets {1,3,4} and {4,3, 1} would 
be considered identical. Subsets are likely the 
object in question whenever your problem seeks 
a "cluster," "collection," "committee," "group," 
"packaging," or "selection." 

Strings, which represent sequences of characters 
or patterns. For example, the names of students 
in a class can be represented by strings. Strings 
are likely the object in question whenever you are 
dealing with "text," "characters," "patterns," or 
"labels." 



Relationship IVIodels 



Trees, which represent hierarchical relationships 
between itenns. Figure (a) illustrates a portion 
of the fannily tree of the Skiena clan. Trees 

are likely the object in question whenever your 
problem seeks a "hierarchy," "dominance relation- 
ship," "ancestor/decendant relationship," or "tax- 
onomy." 
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Graphs, which represent relationships between ar- 
bitrary pairs of objects. Figure (b) models a net- 
work of roads as a graph, where the vertices are 
cities and the edges are roads connecting pairs 
of cities. Graphs are likely the object in question 
whenever you seek a "network," "circuit," "web," 
or "relationship." 



Geometric Objects 

Points, which represent locations in some geomet- 
ric space. For example, the locations of McDon- 
ald's restaurants can be described by points on a 
map/plane. Points are likely the object in question 
whenever your problems work on "sites," "posi- 
tions," "data records," or "locations." 

Polygons, which represent regions in some geo- 
metric space. For example, the borders of a coun- 
try can be described by a polygon on a map/plane. 
Polygons and polyhedra are likely the object in 
question whenever you are working on "shapes," 
"regions," "configurations," or "boundaries." 



Using the Catalog 

These fundamental structures all have associated 
problems and properties, which are presented in 
the catalog of Part II. 

Familiarity with all of these problems is important, 
because they provide the language we use to model 
applications. 

Understanding all or most of these problems, even 
at a cartoon/definition level, will enable you to 
know where to look later when the problem arises 
in your application. 



Rules for Algorithm Design 

The secret to successful algorithm design, and prob- 
lenn solving in general, is to nnake sure you ask the 
right questions. Below, I give a possible series of ques- 
tions for you to ask yourself as you try to solve difficult 
algorithnn design problems: 

1. Do I really understand the problem? 

(a) What exactly does the input consist of? 

(b) What exactly are the desired results or output? 

(c) Can I construct some examples small enough 
to solve by hand? What happens when I solve 
them? 

(d) Are you trying to solve a numerical problem? A 
graph algorithm problem? A geometric prob- 
lem? A string problem? A set problem? Might 
your problem be formulated in more than one 
way? Which formulation seems easiest? 

2. Can I find a simple algorithm for the problem? 

(a) Can I find the solve my problem exactly by 
searching all subsets or arrangements and pick- 
ing the best one? 

i. If so, why am I sure that this algorithm al- 
ways gives the correct answer? 

ii. How do I measure the quality of a solution 
once I construct it? 



iii. Does this simple, slow solution run in poly- 
nonnial or exponential tinne? 

iv. If I can't find a slow, guaranteed correct al- 
gorithnn, ann I sure that nny problenn is well 
defined enough to pernnit a solution? 

(b) Can I solve nny problem by repeatedly trying 
some heuristic rule, like picking the biggest 
item first? The smallest item first? A random 
item first? 

i. If so, on what types of inputs does this heuris- 
tic rule work well? Do these correspond to 
the types of inputs that might arise in the 
application? 

ii. On what types of inputs does this heuristic 
rule work badly? If no such examples can 
be found, can I show that in fact it always 
works well? 

iii. How fast does my heuristic rule come up 
with an answer? 

3. Are there special cases of this problem I know how 
to solve exactly? 

(a) Can I solve it efficiently when I ignore some of 
the input parameters? 

(b) What happens when I set some of the input 
parameters to trivial values, such as or 1? 



(c) Can I simplify the problem to create a problem 
I can solve efficiently? How simple do I have 
to make it? 

(d) If I can solve a certain special case, why can't 
this be generalized to a wider class of inputs? 

4. Which of the standard algorithm design paradigms 
seem most relevant to the problem? 

(a) Is there a set of items which can be sorted by 
size or some key? Does this sorted order make 
it easier to find what might be the answer? 

(b) Is there a way to split the problem in two 
smaller problems, perhaps by doing a binary 
search, or a partition of the elements into big 
and small, or left and right? If so, does this 
suggest a divide-and-conquer algorithm? 

(c) Are there certain operations being repeatedly 
done on the same data, such as searching it for 
some element, or finding the largest/smallest 
remaining element? If so, can I use a data 
structure of speed up these queries, like hash 
tables or a heap/priority queue? 

5. Am I still stumped? 

(a) Why don't I go back to the beginning of the 
list and work through the questions again? Do 
any of my answers from the first trip change 
on the second? 



(a) Is 2^+1 = 0(2") ? 

(b) Is 22" = 0(2") ? 



(a) Is 2"+^ = 0(2")? 

Is 2"+^ < c * 2"? 
Yes, if c> 2 for all n 

(b) Is 22" = 0(2")? 

Is 2^" < c * 2"? 

note 2^" = 2" * 2" 

Is 2^ * 2^ < c * 2^? 

Is 2" < c? 

No! Certainly for any constant c we can find an n such 
that this is not true. 



Elementary Data Structures 

"Mankind's progress is measured by the number of 
things we can do without thinking." 

Elementary data structures such as stacks, queues, 
lists, and heaps will be the "of-the-shelf" components 
we build our algorithm from. There are two aspects to 
any data structure: 

• The abstract operations which it supports. 

• The implementation of these operations. 

The fact that we can describe the behavior of our data 
structures in terms of abstract operations explains why 
we can use them without thinking, while the fact that 
we have different implementation of the same abstract 
operations enables us to optimize performance. 



stacks and Queues 

Sometimes, the order in which we retrieve data is inde- 
pendent of its content, being only a function of when 
it arrived. 

A stac/c supports last-in, first-out operations: push and 
pop. 

A queue supports first-in, first-out operations: enqueue 
and dequeue. 

A deque is a double ended queue and supports all four 
operations: push, pop, enqueue, dequeue. 

Lines in banks are based on queues, while food in my 
refrigerator is treated as a stack. 

Both can be used to traverse a tree, but the order is 
completely different. 





Stack 



Which order is better for WWW crawler robots? 



stack Implementation 

Although this implementation uses an array, a linked 
list would eliminate the need to declare the array size 
in advance. 

STACK-EMPTY(S) 
if top[S] = 

then return TRUE 
else return FALSE 



PUSH(S, x) 

top[S] <r- top[S] + 1 
S[top[S] <r- X 



POP(S) 

if STACK-EMPTY(S) 

then error "underflow" 
else top[S] ^ top[S] - 1 

return S[top[S] + 1] 



top 



All are 0(1) time operations. 



Queue Implementation 

A circular queue implementation requires pointers to 
the head and tail elements, and wraps around to reuse 
array elements. 

ENQUEUE(Q, x) 

Q[tail[Q]] <r- X 

if tail[Q] = length[Q] 

then tail[Q] ^ 1 

else tail[Q] ^ tail[Q] + 1 















X 


X 


X 



tail 



head 



DEQUEUE(Q) 

X = Q[head[Q]] 

if head[Q] = length[Q] 

then head[Q] = 1 

else head[Q] = head[Q] + 1 
return x 

A list-based implementation would eliminate the pos- 
sibility of overflow. 



All are 0(1) time operations. 



Dynamic Set Operations 

Perhaps the most important class of data structures 
maintain a set of items, indexed by keys. 

There are a variety of implementations of these dic- 
tionary operations, each of which yield different time 
bounds for various operations. 

• Searcfi(S,k) - A query that, given a set S and a 
key value k, returns a pointer x to an element in 
S such that key[x] = k, or nil if no such element 
belongs to S. 

• Insert(S,x) - A modifying operation that augments 
the set S with the element x. 

• Deiete(S,x) - Given a pointer x to an element in 
the set 5, remove x from S. Observe we are given 
a pointer to an element x, not a key value. 

• Min(S), Max(S) — Returns the element of the to- 
tally ordered set S which has the smallest (largest) 
key. 

• Next(S,x), Previous(S,x) - Given an element x 
whose key is from a totally ordered set 5, returns 
the next largest (smallest) element in S, or NIL if 
X is the maximum (minimum) element. 



Pointer Based Implementation 

We can maintain a dictionary in either a singly or dou- 
bly linked list. 



L 


^ 


B 


^ 


C 


^ 


D 


^ 


E 


^ 



~:i_ 



D 



:r3L 



We gain extra flexibility on predecessor queries at a cost 
of doubling the nunnber of pointers by using doubly- 
linked lists. 

Since the extra big-Oh costs of doubly-linkly lists is 
zero, we will usually assume they are, although it might 
not be necessary. 



Singly linked to doubly-linked list is as a Conga line is 
to a Can-Can line. 



Array Based Sets 

Unsorted Arrays 

• Search(S,k) - sequential search, 0(n) 

• Insert(S,x) - place in first empty spot, 0(1) 

• Delete(S,x) - copy nth itenn to the a;th spot, 0(1) 

• Min(S,x), Max(S,x) - sequential search, 0(n) 

• Successor(S,x), Predecessor(S,x) - sequential search, 
0(n) 

Sorted Arrays 

• Search(S,k) - binary search, O(lgn) 

• Insert(S,x) - search, then move to make space, 
0(n) 

• Delete(S,x) - move to fill up the hole, 0{n) 

• Min(S,x), Max(S,x) - first or last element, 0(1) 

• Successor(S,x), Predecessor(S,x) - Add or sub- 
tract 1 from pointer, 0(1) 

What are the costs for a heap? 



Unsorted List Implementation 

LIST-SEARCH(L, k) 
X = head[L] 
while X <> NIL and key[x] <> k 

60 X = next[x] 
return x 

Note: the while loop nnight require two lines in some 
progrannnning languages. 



HEAD(L) 



1/ 


^ 
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^ 




-' 






^ 












INSERTION 



LIST-INSERT(L, x) 

next[x] = head[L] 
if head[L] <> NIL 

then prev[head[L]] 
head[L] = x 
prev[x] = NIL 



= X 



LIST-DELETE(L, x) 

if 'pvev\x\ <> NIL 

then next[prev[x]] = next[x] 
else head[L] = next[x] 

if next[x] <> NIL 

then prev[next[x]] = prev[x] 



Sentinels 

Boundary conditions can be elinninated using a sentinel 
element which doesn't go away. 




LIST-SEARCH'(L, k) 
X = next[nil[L]] 
while X <> NIL[L] and key[x] <> k 

60 X = next[x] 
return x 



LIST-INSERT'(L, x) 

next[x] = next[nil[L]] 
prev[next[nil[L]]] = x 
next[nil[L]] = x 
prev[x] = NIL[L] 



LIST-DELETE'(L, x) 

next[prev[x]] <> next[x] 
next[prev[x]] = prev[x] 



Hash Tables 

Hash tables are a very practical way to maintain a dic- 
tionary. As with bucket sort, it assunnes we know that 
the distribution of keys is fairly well-behaved. 

The idea is sinnply that looking an item up in an array 
is 0(1) once you have its index. A hash function is a 
mathematical function which maps keys to integers. 

In bucket sort, our hash function mapped the key to a 
bucket based on the first letters of the key. "Collisions" 
were the set of keys mapped to the same bucket. 

If the keys were uniformly distributed, then each bucket 
contains very few keys! 

The resulting short lists were easily sorted, and could 
just as easily be searched! 
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Hash Functions 

It is the job of the hash function to nnap keys to inte- 
gers. A good hash function: 

1. Is cheap to evaluate 

2. Tends to use all positions from 0...M with uni- 
fornn frequency. 

3. Tends to put sinnilar keys in different parts of the 
tables (Remember the ShiflettsI!) 

The first step is usually to map the key to a big integer, 
for example 

keylength 

h= ^ 128^ X char{key[i\) 

i=0 

This large number must be reduced to an integer whose 
size is between 1 and the size of our hash table. 

One way is by h{k) = k mod M, where M is best a 
large prime not too close to 2* — 1, which would just 
mask off the high bits. 

This works on the same principle as a roulette wheel! 



Good and Bad Hash functions 



The first three digits of the Social Security Number 



The last three digits of the Social Security Number 



The Birthday Paradox 

No matter how good our hash function is, we had bet- 
ter be prepared for collisions, because of the birthday 
paradox. 



M 



M 



Jl 



o 



N 



D 



The probability of there being no collisions after n in- 
sertions into an m-elennent table is 

(m/m)x((m— l)/m)x...x((m— n-|-l)/m) = n"jQ(m— z)/m 

When m = 366, this probability sinks below 1/2 when 
N = 23 and to almost when N > 50. 



'****f,. 



Collision Resolution by 
Chaining 

The easiest approach is to let each element in the hash 
table be a pointer to a list of keys. 
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Insertion, deletion, and query reduce to the problenn in 
linked lists. If the n keys are distributed uniformly in a 
table of size m/n, each operation takes 0{m/n) time. 

Chaining is easy, but devotes a considerable amount of 
memory to pointers, which could be used to make the 
table larger. Still, it is my preferred method. 



Open Addressing 

We can dispense with all these pointers by using an 
implicit reference derived from a simple function: 

1 234567 89 10 11 







X 
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X 




X 


X 









If the space we want to use is filled, we can examine 
the remaining locations: 

1. Sequentially ^, ^ + 1, ^ + 2, .. . 

2. Quadratically h,h-\- l'^,h-\-2'^,h-\- 3'^ . . . 

3. Linearly h,h-\- k,h-\- 2k, h + 3A;, . . . 



The reason for using a more complicated science is to 
avoid long runs from similarly hashed keys. 

Deletion in an open addressing scheme is ugly, since 
removing one element can break a chain of insertions, 
making some elements inaccessible. 



Performance on Set 
Operations 

With either chaining or open addressing: 

• Search - 0(1) expected, 0{n) worst case 

• Insert - 0(1) expected, 0{n) worst case 

• Delete - 0(1) expected, 0(n) worst case 

• Min, Max and Predecessor, Successor ©(n + m) 
expected and worst case 

Pragnnatically, a hash table is often the best data struc- 
ture to nnaintain a dictionary. However, we will not use 
it much in proving the efficiency of our algorithnns, 
since the worst-case time is unpredictable. 

The best worst-case bounds come from balanced bi- 
nary trees, such as red-black trees. 



For each of the four types of linked lists in the following 
table, what is the asymptotic worst-case running time 
for each dynamic-set operation listed? 





singly 


singly 


doubly 


doubly 




un sorted 


sorted 


un sorted 


sorted 


Search(L, k) 


0(N) 


0(N) 


0(N) 


0(N)- 


Insert(i, x) 


0(1) 


0(N) 


0(1) 


0(N)- 


Delete(Z,, x) 


0(N)* 


0(N)* 


o(i) 


o(i) 


Successor(L, x) 


0(N) 


0(1) 


0(N) 


0(1) 


Predecessor(I/, x) 


0(N) 


0(N) 


0(N) 


o(i) 


Minimum(i) 


0(N) 


o(i) 


0(N) 


o(i) 


Maximum(L) 


0(N) 


o(i)+ 


0(N) 


o(i)+ 



I need a pointer to the predecessor! (*) 
I need a pointer to the tail! (+) 



Only bottlenecks in otherwise perfect dictionary! 

(-) 



Binary Search Trees 

"I think that I shall never see 
a poem as lovely as a tree Poem's 
are wrote by fools like me but only 
G-d can make a tree " 
- Joyce Kilmer 

Binary search trees provide a data structure which ef- 
ficiently supports all six dictionary operations. 

A binary tree is a rooted tree where each node contains 
at most two children. 

Each child can be identified as either a left or right 
child. 




A binary tree can be implemented where each node 
has left and right pointer fields, an (optional) parent 
pointer, and a data field. 



Binary Search Trees 



A binary search tree labels each node in a binary tree 
with a single key such that for any node x, and nodes 
in the left subtree of x have keys < x and all nodes in 
the right subtree of x have key's > x. 





Left: A binary search tree. Right: A heap but not a 
binary search tree. 

The search tree labeling enables us to find where any 
key is. Start at the root - if that is not the one we want, 
search either left or right depending upon whether what 
we want is < or > then the root. 



Searching in a Binary Tree 

Dictionary search operations are easy in binary trees 

TREE-SEARCH(x, k) 

if (x = NIL) and (k = key[x]) 
then return x 

if (k < key[x]) 

then return TREE-SEARCH(left[x],k) 
else return TREE-SEARCH(right[x],k) 



The aigorithnn works because both the left and right 
subtrees of a binary search tree are binary search trees 
- recursive structure, recursive algorithm. 

This takes time proportional to the height of the tree, 



Maximum and Minimum 

Where are the maximum and minimum elements in a 
binary tree? 





TREE-MAXIMUM(X) 

while right[x] ^ NIL 
do X = right[x] 
return x 

TREE-MINIMUM(x) 

while left[x] i^ NIL 
do X = left[x] 
return x 



Both take time proportional to the height of the tree, 
0{h). 



Where is the predecessor? 

Where is the predecessor of a node in a tree, assunning 
all keys are distinct? 




PREDECESSOR(X) SUCCESSOR(X) 



If X has two children, its predecessor is the maximum 
value in its left subtree and its successor the minimum 
value in its right subtree. 



What if a node doesn't have 

children? 



o 

predecessor(x) /^^ 





If it does not have a left child, a node's predecessor is 
its first left ancestor. 

The proof of correctness comes from looking at the 
in-order traversal of the tree. 

Tree-Successor(a?) 

if right[x] ^ NIL 

then return Tree-Minim um(rz5f^t [a?]) 

y ^ pW 
while {y ^ NIL) and {x = right[y]) 

do X <— y 

y ^ ply] 

return y 



Tree predecessor/successor both run in time propor- 
tional to the height of the tree. 



In-Order Traversal 




Inorcler-Tree-walk(ic) 
if {x <> NIL) 
then Inorcler-Tree-Walk(/e/t[a;]) 

print key[x] 

Inorder-Tree-wall<(rz5f^t[a?]) 



A-B-C-D-E-F-G-H 



Tree Insertion 

Do a binary search to find where it should be, then 
replace the termination NIL pointer with the new item. 




Tree-insert(T, 2:) 
y = NIL 
X = root[T] 
While X ^ NIL 
do y ^ X 

if fcey[2;] < fcey[ic] 

then X ^ left[x] 
else X ^ right[x] 
p[z] ^ y 
if y = NIL 

then root[T] <r- z 
else if key[z] < key[y] 
then left[y] <r- z 
else right[y] <— z 



y is maintained as the parent of x, since x eventually 
becomes NIL. 

The final test establishes whether the NIL was a left 
or right turn from y. 



Insertion takes tinne proportional to the height of the 
tree, 0{h). 



Tree Deletion 

Deletion is sonnewhat more tricky than insertion, be- 
cause the node to die nnay not be a leaf, and thus effect 
other nodes. 

Case (a), where the node is a leaf, is simple - just NIL 
out the parents child pointer. 

Case (b), where a node has one chid, the doomed node 
can just be cut out. 

Case (c), relabel the node as its successor (which has 
at most one child when z has two children!) and delete 
the successor! 

This implementation of deletion assumes parent point- 
ers to make the code nicer, but if you had to save space 
they could be dispensed with by keeping the pointers 
on the search path stored in a stack. 

Tree-Delete(T,z) 

if (left[z] = NIL) or (right[z] = NIL) 

then y <— z 

else y <r- Tree-Successor(z) 
if left[y] ^ NIL 

then X <r- left[y] 

else X <— right[y] 
\f x^ NIL 

then p[x] <— p[y] 

if p[y] = NIL 

then root[T] <r- x 
else if (y = left[p[y]]) 

then left[p[y]] <r- x 



else rzgf^t[p[y]] <— x 
if (y <> z) 

then key[z] <r- key[y] 

/* If y has other fields, copy them, too. */ 
return y 



Lines 1-3 deternnine which node y is physically rennoved. 

Lines 4-6 identify x as the non-nil decendant, if any. 

Lines 7-8 give x a new parent. 

Lines 9-10 modify the root node, if necessary 

Lines 11-13 reattach the subtree, if necessary. 

Lines 14-16 if the removed node is deleted, copy. 

Conclusion: deletion takes time proportional to the 
height of the tree. 



Balanced Search Trees 

All six of our dictionary operations, when innplemented 
with binary search trees, take 0{h), where h is the 
height of the tree. 

The best height we could hope to get is Ig n, if the tree 
was perfectly balanced, since 






n 



But if we get unlucky with our order of insertion or 
deletion, we could get linear height! 

nsert(a) 
nsert(6) 
nsert(c) 
nsert(c?) 




In fact, randonn search trees on average have ©(IgiV) 
height, but we are worried about worst case height. 

We can't easily use randonnization - Why? 



Perfectly Balanced Trees 

Perfectly balanced trees require a lot of work to main- 
tain: 




If we insert the key 1, we must move every single node 
in the tree to rebalance it, taking ©(n) time. 

Therefore, when we talk about "balanced" trees, we 
mean trees whose height is O(lgn), so all dictionary 
operations (insert, delete, search, min/max, succes- 
sor/predecessor) take O(lgn) time. 

Red-Black trees are binary search trees where each 
node is assigned a color, where the coloring scheme 
helps us maintain the height as ©(Ign). 



Red-Black Tree Definition 

Red-black trees have the following properties: 

1. Every node is colored either red or black. 

2. Every leaf (NIL pointer) is black. 

3. If a node is red then both its children are black. 

4. Every single path from a node to a decendant leaf 
contains the same number of black nodes. 



What does this mean? 

If the root of a red-black tree is black can we just color 
it red? 

No! For one of its children might be red. 



If an arbitrary node is red can we color it black? 

No! Because now all nodes nnay not have the sanne 
black height. 



What tree nnaxinnizes the number of nodes in a tree of 
black height h? 



Jkk 



What does a red-black tree with two real nodes look 
like? 





Not (1) - consecutive reds Not (2), (4) - Non-Uniform 
black height 



Red-Black Tree Height 

Lemma: A red-black tree with n internal nodes has 
height at nnost 2 lg(n + 1). 

Proof: Our strategy; first we bound the number of 
nodes in any subtree, then we bound the height of any 
subtree. 

We claim that any subtree rooted at x has at least 2^^^""^ 
- 1 internal nodes, where bh{x) is the black height of 
node X. 

Proof, by induction: 

bh{x) = ^ a? is a leaf, ^ 2° - 1 = 

Now assume it is true for all tree with black height 
< bh{x). 

If x is black, both subtrees have black height bh{x) — l. 
If X is red, the subtrees have black height bh{x). 

Therefore, the number of internal nodes in any subtree 
is 



n 



>s^ Q6/i(ir) — 1 1 I *-)6/i(ir) — 1 -| _|_ 1 "v^ q6/i(cc) -i 



Now, let h be the height of our red-black tree. At 
least half the nodes on any single path from root to 
leaf must be black if we ignore the root. 

Thus bh(x) > h/2 and n > 2^/^ - 1, so n + 1 > 2^/^. 

This implies that lg(n + 1) > h/2,so h <2 lg(n + 1). | 

Therefore red-black trees have height at most twice 
optimal. We have a balanced search tree if we can 
maintain the red-black tree structure under insertion 
and deletion. 



Show that any n-node tree can be transformed to any 
other using 0{n) rotations (hint: convert to a right 
going chain). 



I will start by showing weaker bounds - that O(n^) 
and 0(n log n) rotations suffice - because that is how I 
proceeded when I first saw the problenn. 

First, observe that creating a right-going, for t2 path 
from t\ i and reversing the same construction gives a 
path from t\ to t2. 

Note that it will take at most n rotations to make 
the lowest valued key the root. Once it is root, all 
keys are to the right of it, so no more rotations need 
go through it to create a right-going chain. Repeating 
with the second lowest key, third, etc. gives that O(n^) 
rotations suffice. 

Now that if we try to create a completely balanced tree 
instead. To get the n/2 key to the root takes at most 
n rotations. Now each subtree has half the nodes and 
we can recur... 

N 



N/2 N/2 





N/4 N/4 N/4 N/4 



To get a linear algorithm, we must beware of trees like: 




'A 





The correct answer is that n — 1 rotations suffice to 
get to a rightmost chain. 

By picking the lowest node on the rightmost chain 
which has a left ancestor, we can add one node per 
rotation to the right most chain! 





Initially, the rightmost chain contained at least 1 node, 
so after n— 1 rotations it contains all n. Slick! 



Given an element x in an n-node order-statistic binary 
tree and a natural number i, how can the ith successor 
of X be determined in O(lgn) time. 



This problem can be solved if our data structure sup- 
ports two operations: 

• Rank(x) - what is the position of x in the total 
order of keys? 

• Get(i) - what is the key in the zth position of the 
total order of keys? 

What we are interested in is Get{Rank{x) -j-z)- 

In an order statistic tree, each node x is labeled with 
the nunnber of nodes contained in the subtree rooted 
in X. 




Innplennenting both operations involves keeping track 
of how nnany nodes lie to the left of our path. 



Why don't CS profs ever stop 
talking about sorting?! 

1. Computers spend more time sorting than anything 
else, historically 25% on mainframes. 

2. Sorting is the best studied problem in computer 
science, with a variety of different algorithms known 

3. Most of the interesting ideas we will encounter in 
the course can be taught in the context of sort- 
ing, such as divide-and-conquer, randomized algo- 
rithms, and lower bounds. 

You should have seen most of the algorithms - we will 
concentrate on the analysis. 



Applications of Sorting 

One reason why sorting is so innportant is that once 
a set of itenns is sorted, nnany other problenns beconne 
easy. 



Searching 

Binary search lets you test whether an itenn is in a 
dictionary in O(lgn) tinne. 

Speeding up searching is perhaps the nnost innportant 
application of sorting. 



Closest pair 



Given n numbers, find the pair which are closest to 
each other. 

Once the nunnbers are sorted, the closest pair will be 
next to each other in sorted order, so an 0{n) linear 
scan completes the job. 



Element uniqueness 

Given a set of n items, are they all unique or are there 
any duplicates? 

Sort them and do a linear scan to check all adjacent 
pairs. 

This is a special case of closest pair above. 



Frequency distribution - IVIode 

Given a set of n items, which element occurs the largest 
number of times? 

Sort them and do a linear scan to measure the length 
of all adjacent runs. 



IVIedian and Selection 

What is the fcth largest item in the set? 

Once the keys are placed in sorted order in an array, 
the fcth largest can be found in constant time by simply 
looking in the fcth position of the array. 



Convex hulls 

Given n points in two dimensions, find the snnallest area 
polygon which contains thenn all. 




The convex hull is like a rubber band stretched over 
the points. 

Convex hulls are the nnost innportant building block for 
more sophisticated geometric algorithms. 

Once you have the points sorted by x-coordinate, they 
can be inserted from left to right into the hull, since 
the rightmost point is always on the boundary. 

Without sorting the points, we would have to check 
whether the point is inside or outside the current hull. 

Adding a new rightmost point might cause others to 
be deleted. 



Hufrman codes 

If you are trying to minimize the amount of space a 
text file is taking up, it is silly to assign each letter the 
same length (ie. one byte) code. 

Example: e is more common than q, a is more common 
than z. 

If we were storing English text, we would want a and 
e to have shorter codes than q and z. 

To design the best possible code, the first and most 
important step is to sort the characters in order of 
frequency of use. 



Character 


Frequency 


Code 


f 


5 


1100 


e 


9 


1101 


c 


12 


100 


b 


13 


101 


d 


16 


111 


a 


45 






Selection Sort 

A simple Oi'n?) sorting algorithm is selection sort. 

Sweep through all the elements to find the smallest 
item, then the smallest remaining item, etc. until the 
array is sorted. 

Selection-sort(A) 
for z = 1 to n 

for j = i -\- 1 to n 

if (A[j] < A[i\) then swap(A[i],A[j]) 

It is clear this algorithm must be correct from an in- 
ductive argument, since the zth element is in its correct 
position. 

It is clear that this algorithm takes O(n^) time. 

It is clear that the analysis of this algorithm cannot 
be improved because there will be n/2 iterations which 
will require at least n/2 comparisons each, so at least 
n^/4 comparisons will be made. More careful analysis 
doubles this. 

Thus selection sort runs in ©(n^) time. 



Binary Heaps 



A binary heap is defined to be a binary tree with a key 
in each node such that: 

1. All leaves are on, at most, two adjacent levels. 

2. All leaves on the lowest level occur to the left, 
and all levels except the lowest one are connpletely 
filled. 

3. The key in root is > all its children, and the left 
and right subtrees are again binary heaps. 



Conditions 1 and 2 specify shape of the tree, and con- 
dition 3 the labeling of the tree. 




The ancestor relation in a heap defines a partial or- 
der on its elements, which means it is reflexive, anti- 
symmetric, and transitive. 

1. Reflexive: a; is an ancestor of itself. 

2. Anti-symmetric: \f x is an ancestor of y and y is 
an ancestor of x, then x = y. 

3. Transitive: \f x is an ancestor of y and y is an 
ancestor of z, a; is an ancestor of z. 

Partial orders can be used to model heirarchies with 
incomplete information or equal-valued elements. One 
of my favorite games with my parents is fleshing out 
the partial order of "big" old-time movie stars. 

The partial order defined by the heap structure is weaker 
than that of the total order, which explains 

1. Why it is easier to build. 

2. Why it is less useful than sorting (but still very 
important). 



Constructing Heaps 

Heaps can be constructed incrementally, by inserting 
new elements into the left-most open spot in the array. 



If the new element is greater than its parent, swap their 
positions and recur. 

Since at each step, we replace the root of a subtree by 
a larger one, we preserve the heap order. 

Since all but the last level is always filled, the height h 
of an n element heap is bounded because: 



^ 2^ = 2^+^ - 1 > n 



so h = [Ig n\ 



Doing n such insertions takes ©(niogn), since the last 
n/2 insertions require O(logn) time each. 



Heapify 



The bottom up insertion algorithm gives a good way 
to build a heap, but Robert Floyd found a better way, 
using a merge procedure called heapify. 

Given two heaps and a fresh element, they can be 
merged into one by making the new one the root and 
trickling down. 

Build-heap(A) 
n = \A\ 

For i = [n/2\ to 1 do 
Heapify(A,i) 



Heapify(A,i) 

left = 2i 

right = 2i -\- 1 

if {left < n) and {A[left] > A[i]) then 

max = left 

else max = i 
if {right < n) and {A{right] > A[max]) then 

max = right 
if {max 7^ z) then 

swap(A[i],A[max]) 

Heapify(A,max) 



Rough Analysis of Heapify 

Heapify on a subtree containing n nodes takes 

T(n) <T(2n/3) + 0(l) 

The 2/3 comes from merging heaps whose levels dif- 
fer by one. The last row could be exactly half filled. 
Besides, the asymptotic answer won't change so long 
the fraction is less than one. 

Solve the recurrence using the Master Theorem. 

Let a = 1, 6 = 3/2 and /(n) = 1. 

Note that 0(n'°93/2i) = 0(1), since log3/2 1 = 0. 

Thus Case 2 of the Master theorem applies. 



The Master Theorem: Let a > 1 and 6 > 1 be constants, let f{n) 
be a function, and let T{n) be defined on the nonnegatlve Integers 
by the recurrence 

T{n) = aT{n/b) + /(n) 

where we Interpret n/b to mean either [n/bj or [n/6]. Then T{n) 
can be bounded asymptotically as follows: 

1. If f(n) = 0(n'°9*"-^) for some constant e > 0, then T(n) = 
0(ni°9*«). 

2. If f{n) = 0(n'°9*«), then T{n) = 0(n'°9*« Ig n). 

3. If /(n) = Q(n'°9*°+^) for some constant e > 0, and If af(n/b) < 
cf{n) for some constant c < 1, and all sufficiently large n, 
then T{n) = 0(/(n)). 



Exact Analysis of Heapify 

In fact, Heapify performs better than O(nlogn), be- 
cause most of the heaps we merge are extremely small, 




In a full binary tree on n nodes, there are n/2 nodes 
which are leaves (i.e. height 0), n/4 nodes which are 
height 1, n/8 nodes which are height 2, . . . 

In general, there are at most [n/2^+^] nodes of height 
h, so the cost of building a heap is: 



[IgnJ [IgnJ 

Y^ \n/2>^+']Oih) = Oin J] h/2'^) 



h=0 



h=0 



Since this sum is not quite a geometric series, we can't 
apply the usual identity to get the sum. But it should 
be clear that the series converges. 



Proof of Convergence 

Series convergence is the "free lunch" of algorithm 
analysis. 

The identify for the sum of a geometric series is 

oo ^ 



_ X 

k=o 



If we take the derivative of both sides, 



oo 



y kx''-^ = _J-_ 

Multiplying both sides of the equation by x gives the 
identity we need: 



oo 

y kx'' = ^-%r 

Substituting x = 1/2 gives a sum of 2, so Build-heap 
uses at most 2n comparisons and thus linear time. 



The Lessons of Heapsort, I 

"Are we doing a careful analysis? Might our algorithm 
be faster than it seenns?" 

Typically in our analysis, we will say that since we are 
doing at most x operations of at most y time each, the 
total time is 0{xy). 

However, if we overestimate too much, our bound may 
not be as tight as it should be! 



Heapsort 



Heapify can be used to construct a heap, using the 
observation that an isolated element forms a heap of 
size 1. 

Heapsort(A) 

Build-heap(A) 
for z = n to 1 do 

swap(A[l],A[i]) 
n ^ n — 1 
Heapify(A,l) 



If we construct our heap from bottom to top using 
Heapify, we do not have to do anything with the last 
n/2 elements. 

With the implicit tree defined by array positions, (i.e. 
the zth position is the parent of the 2zth and (2z + l)st 
positions) the leaves start out as heaps. 

Exchanging the maximum element with the last ele- 
ment and calling heapify repeatedly gives an O(nlgn) 
sorting algorithm, named Heapsort. 



Heapsort Animations 



The Lessons of Heapsort, II 

Always ask yourself, "Can we use a different data struc- 
ture?" 

Selection sort scans throught the entire array, repeat- 
edly finding the smallest remaining element. 

For z = 1 to n 

A: Find the smallest of the first n — z + 1 items. 

B: Pull it out of the array and put it first. 



Using arrays or unsorted linked lists as the data struc- 
ture, operation A takes 0{n) time and operation B 
takes 0(1). 

Using heaps, both of these operations can be done 
within O(lgn) time, balancing the work and achieving 
a better tradeoff. 



Priority Queues 



A priority queue is a data structure on sets of keys 
supporting the following operations: 

• Insert (S, x) - insert x into set S 

• Maximum(S) - return the largest key in S 

• ExtractMax(S) - return and remove the largest key 

in S 

These operations can be easily supported using a heap. 

• Insert- use the trickle up insertion in O(logn). 

• Maximum - read the first element in the array in 
Oil). 

• Extract-Max - delete first element, replace it with 
the last, decrement the element counter, then heapify 
in 0(log n). 



Applications of Priority Queues 



Heaps as stacks or queues 

• In a stack, push inserts a new item and pop re- 
moves the most recently pushed item. 

• In a queue, enqueue inserts a new item and de- 
queue removes the least recently enqueued item. 

Both stacks and queues can be simulated by using a 
heap, when we add a new time field to each item and 
order the heap according it this time field. 

• To simulate the stack, increment the time with 
each insertion and put the maximum on top of 
the heap. 

• To simulate the queue, decrement the time with 
each insertion and put the maximum on top of the 
heap (or increment times and keep the minimum 
on top) 

This simulation is not as efficient as a normal stack/queue 
implementation, but it is a cute demonstration of the 
flexibility of a priority queue. 



Discrete Event Simulations 

In simulations of airports, parking lots, and jai-alai - 
priority queues can be used to nnaintain who goes next. 



The stack and queue orders are just special cases of 
orderings. In real life, certain people cut in line. 



Sweepline Algorithms in 
Computational Geometry 




In the priority queue, we will store the points we have 
not yet encountered, ordered by x coordinate, and 
push the line forward one stop at a tinne. 



Greedy Algorithms 

In greedy algorithms, we always pick the next thing 
which locally maximizes our score. By placing all the 
things in a priority queue and pulling them off in or- 
der, we can improve performance over linear search or 
sorting, particularly if the weights change. 

Example: Sequential strips in triangulations. 



Danny Heep 



Show that an n-element heap has height [IgnJ. 



Since it is balanced binary tree, the height of a heap 
is clearly O(lgn), but the problem asks for an exact 
answer. 

The height is defined as the number of edges in the 
longest simple path from the root. 




The number of nodes in a complete balanced binary 
tree of height h is 2^+^ - 1. 



Thus the height increases only when n = 2'^ 
other words when Ign is an integer. 



n 



or in 



Is a reverse sorted array a heap? 



In a heap, each element is greater than or equal to 
each of its descendants. 

In the array representation of a heap, the descendants 
of the zth element are the 2zth and (2z+l)th elements. 

If A is sorted in reverse order, then A[i\ > A[j] implies 
that i < j. 

Since 2z > i and 2z + 1 > z then A[2i] < A[i] and 
A[2i-\- 1] < A[i]. 

Thus by definition A is a heap! 



Quicksort 



Although mergesort is 0(n Ig n), it is quite inconvenient 
for implementation with arrays, since we need space to 
merge. 

In practice, the fastest sorting algorithm is Quicksort, 
which uses partitioning as its main idea. 

Example: Pivot about 10. 

17 12 6 19 23 8 5 10 - before 

6 8 5 10 23 19 12 17 - after 

Partitioning places all the elements less than the pivot 
in the left part of the array, and all elements greater 
than the pivot in the right part of the array. The pivot 
fits in the slot between them. 

Note that the pivot element ends up in the correct 
place in the total order! 



Partitioning the elements 

Once we have selected a pivot element, we can parti- 
tion the array in one linear scan, by maintaining three 
sections of the array: < pivot, > pivot, and unexplored. 

Example: pivot about 10 

— 17 12 6 19 23 8 5 — 10 

— 5 12 6 19 23 8 — 17 
5 — 12 6 19 23 8 — 17 
5 — 8 6 19 23 — 12 17 
5 8 — 6 19 23 — 12 17 
5 8 6 — 19 23 — 12 17 
5 8 6 — 23 — 19 12 17 
5 8 6 23 19 12 17 

5 8 6 10 19 12 17 23 

As we scan from left to right, we move the left bound 
to the right when the element is less than the pivot, 
otherwise we swap it with the rightmost unexplored 
element and move the right bound one step closer to 
the left. 



Since the partitioning step consists of at most n swaps, 
takes time linear in the number of keys. But what does 
it buy us? 

1. The pivot element ends up in the position it retains 
in the final sorted order. 

2. After a partitioning, no element flops to the other 
side of the pivot in the final sorted order. 

Thus we can sort the elements to the left of the pivot 
and the right of the pivot independently! 

This gives us a recursive sorting algorithm, since we 
can use the partitioning approach to sort each sub- 
problem. 



Quicksort Animations 



Pseudocode 



Sort(A) 

Quicksort(A,l,n) 



Quicksort(A, low, high) 
if (low < high) 

pivot-location = Partition(A, low, high) 
Quicksort(A,low, pivot-location - 1) 
Quicksort(A, pivot-location+1, high) 



Partition (A, low, high) 
pivot = A[low] 
leftwall = low 
for i = low+1 to high 

if (A[i] < pivot) then 

leftwall = leftwall+1 
swap(A[i],A[leftwall]) 
swap(A[low],A[leftwall]) 



Best Case for Quicksort 



Since each element ultimately ends up in the correct 
position, the algorithm correctly sorts. But how long 
does it take? 

The best case for divide-and-conquer algorithms comes 
when we split the input as evenly as possible. Thus in 
the best case, each subproblem is of size n/2. 

The partition step on each subproblem is linear in its 
size. Thus the total effort in partitioning the 2^ prob- 
lems of size n/2^ is 0{n). 

The recursion tree for the best case looks like this: 



The total partitioning on each level is 0{n), and it take 
Ig n levels of perfect partitions to get to single element 
subproblems. When we are down to single elements, 
the problems are sorted. Thus the total time in the 
best case is O(nlgn). 



Worst Case for Quicksort 



Suppose instead our pivot element splits the array as 
unequally as possible. Thus instead of n/2 elements in 
the smaller half, we get zero, meaning that the pivot 
element is the biggest or smallest element in the array. 



D 



n 



Now we have n— 1 levels, instead of Ig n, for a worst 
case time of ©(n^), since the first n/2 levels each have 
> n/2 elements to partition. 

Thus the worst case time for Quicksort is worse than 
Heapsort or Mergesort. 

To justify its name. Quicksort had better be good in 
the average case. Showing this requires some fairly 
intricate analysis. 

The divide and conquer principle applies to real life. If 
you will break a job into pieces, it is best to make the 
pieces of equal size! 



Intuition: Tine Average Case 

for Quicksort 

Suppose we pick the pivot element at random in an 
array of n keys. 



n/4 



n/2 



3n/4 



Half the time, the pivot element will be from the center 
half of the sorted array. 

Whenever the pivot element is from positions n/4 to 
3n/4, the larger remaining subarray contains at most 
3n/4 elements. 

If we assume that the pivot element is always in this 
range, what is the maximum number of partitions we 
need to get from n elements down to 1 element? 



d , 



(3/4)' • n = 1 — yn= (4/3)' 



lgn = Mg(4/3) 



Therefore / = lg(4/3)-lg(n) < 2lgn good partitions suffice. 



What have we shown? 



At most 2 Ig n levels of decent partitions suffices to sort 
an array of n elements. 

But how often when we pick an arbitrary element as 
pivot will it generate a decent partition? 

Since any number ranked between n/4 and 3n/4 would 
make a decent pivot, we get one half the time on av- 
erage. 

If we need 2 Ig n levels of decent partitions to finish the 
job, and half of random partitions are decent, then on 
average the recursion tree to quicksort the array has 
4lg n levels. 






Since 0{n) work is done partitioning on each level, the 
average time is O(nlgn). 

More careful analysis shows that the expected number 
of comparisons is w 1.38nlgn. 



Average-Case Analysis of 

Quicksort 

To do a precise average-case analysis of quicksort, we 
formulate a recurrence given the exact expected time 
T(n): 

n 

^(^) = E -(^(^ - 1) + T(n - p)) + n - 1 

p=i 

Each possible pivot p is selected with equal probability. 
The number of comparisons needed to do the partition 
is n — 1. 

We will need one useful fact about the Harmonic num- 
bers Hn, namely 



n 

- /' ' " n 

i=i 



Hn = y2 1/* ^ In 



It is important to understand (1) where the recurrence 
relation comes from and (2) how the log comes out 
from the summation. The rest is just messy algebra. 



^M = y; -(Tip - 1) + T(n - p)) + n - 1 



-1 '^ 



2 " 



T(n) = -Vr(p-l)+n-l 



'^ -1 



n 

nT{n) = 2^T{p- 1) + n(n - 1) multiply by n 
p=i 

n-l 

(n-l)T(n-l) = 2^T(p-l) + (n-l)(n-2) apply to n-l 

p=i 

nT{n) -{n- l)T(n - 1) = 2T{n - 1) + 2(n - 1) 
rearranging the terms give us: 

T(n) _ T{n - 1) , 2(n - 1) 
n -j- 1 n n{n + 1) 

substituting an = A{n)/{n + 1) gives 



, 2(n-l) _^ 2(z-l) 
""^ ~ """-' ^ n(n + n ~ ^ zCz + 1 



^ 1 

^ n 



«"«2|:^«2ln 



We are really interested in A{n), so 

A(n) = {n-\- l)an « 2(n + 1) In n w 1.38n Ig n 



What is the Worst Case? 

The worst case for Quicksort depends upon how we 
select our partition or pivot element. If we always select 
either the first or last element of the subarray, the 
worst-case occurs when the input is already sorted! 

A B D F H J K 

B D F H J K 

D F H J K 

F H J K 

H J K 

J K 

K 

Having the worst case occur when they are sorted or 
almost sorted is very bad, since that is likely to be the 
case in certain applications. 

To eliminate this problem, pick a better pivot: 

1. Use the middle element of the subarray as pivot. 

2. Use a random element of the array as the pivot. 

3. Perhaps best of all, take the median of three el- 
ements (first, last, middle) as the pivot. Why 
should we use median instead of the mean? 

Whichever of these three rules we use, the worst case 
remains O(n^). However, because the worst case is 
no longer a natural order it is much more difficult to 
occur. 



Is Quicksort really faster than 

Heapsort? 

Since Heapsort is 0(nlgn) and selection sort is 0(n^), 
there is no debate about which will be better for decent- 
sized files. 

But how can we compare two ©(nign) algorithms to 
see which is faster? Using the RAM model and the big 
Oh notation, we can't! 

When Quicksort is implemented well, it is typically 2-3 
times faster than mergesort or heapsort. The primary 
reason is that the operations in the innermost loop are 
simpler. The best way to see this is to implement both 
and experiment with different inputs. 

Since the difference between the two programs will be 
limited to a multiplicative constant factor, the details 
of how you program each algorithm will make a big 
difference. 

If you don't want to believe me when I say Quicksort is 
faster, I won't argue with you. It is a question whose 
solution lies outside the tools we are using. 



Randomization 



Suppose you are writing a sorting progrann, to run on 
data given to you by your worst enemy. Quicksort is 
good on average, but bad on certain worst-case in- 
stances. 

If you used Quicksort, what kind of data would your 
enemy give you to run it on? Exactly the worst-case 
instance, to make you look bad. 

But instead of picking the median of three or the first 
element as pivot, suppose you picked the pivot element 
at random. 

Now your enemy cannot design a worst-case instance 
to give to you, because no matter which data they give 
you, you would have the same probability of picking a 
good pivot! 

Randomization is a very important and useful idea. By 
either picking a random pivot or scrambling the per- 
mutation before sorting it, we can say: 

"With high probability, randomized quicksort 
runs in ©(nign) time." 

Where before, all we could say is: 

"If you give me random input data, quicksort 
runs in expected ©(nign) time." 



Since the time bound how does not depend upon your 
input distribution, this means that unless we are ex- 
tremely unlucky (as opposed to ill prepared or unpop- 
ular) we will certainly get good performance. 

Randomization is a general tool to improve algorithms 
with bad worst-case but good average-case complexity. 

The worst-case is still there, but we almost certainly 
won't see it. 



Argue that insertion sort is better than Quicksort for 
sorting checks 



In the best case, Quicksort takes 0(nlgn). Although 
using nnecJian-of-three turns the sorted permutation into 
the best case, we lose if insertion sort is better on the 
given data. 

1234679 11 — 5 

In insertion sort, the cost of each insertion is the nunn- 
ber of items which we have to jump over. In the check 
example, the expected number of moves per items is 
small, say c. We win if c « Ig n. 



Why do we analyze the average-case performance of a 
randomized algorithm, instead of the worst-case? 



In a randomized algorithm, the worst case is not a 
matter of the input but only of luck. Thus we want to 
know what kind of luck to expect. Every input we see 
is drawn from the uniform distribution. 



How many calls are made to Random In randomized 
quicksort In the best and worst cases? 



Each call to random occurs once in each call to parti- 
tion. 

The nunnber of partitions is ©(n) in any run of quick- 
sort!! 





There is some potential variation depending upon what 
you do with intervals of size 1 - do you call partition on 
intervals of size one? However, there is no asymptotic 
difference between best and worst case. 

The reason - any binary tree with n leaves has n— 1 
internal nodes, each of which corresponds to a call to 
partition in the quicksort recursion tree. 



Can we sort in better than 

n Ig n? 

Any comparison-based sorting program can be thought 
of as defining a decision tree of possible executions. 

Running the same program twice on the same per- 
mutation causes it to do exactly the same thing, but 
running it on different permutations of the same data 
causes a different sequence of comparisons to be made 
on each. 




(1,3,2) (3,1,2) 



(2,3,1) (3,2,1) 



Claim: the height of this decision tree is the worst-case 
complexity of sorting. 



Once you believe this, a lower bound on the time com- 
plexity of sorting follows easily. 

Since any two different permutations of n elements 
requires a different sequence of steps to sort, there 
must be at least n! different paths from the root to 
leaves in the decision tree, ie. at least n! different 
leaves in the tree. 

Since only binary comparisons (less than or greater 
than) are used, the decision tree is a binary tree. 

Since a binary tree of height h has at most 2^ leaves, 
we know n! < 2^, or ^ > lg(n!). 

By inspection n\ > (n/2)"/^, since the last n/2 terms of 
the product are each greater than n/2. By Sterling's 
approximation, a better bound is n! > (n/e)" where 
e = 2.718. 

h > lg(n/e)" = nlgn — nlge = Q(n Ig n) 



Non-Comparison-Based Sorting 

All the sorting algorithms we have seen assume binary 
comparisons as the basic primative, questions of the 
form "is x before y?" . 

Suppose you were given a deck of playing cards to sort. 
Most likely you would set up 13 piles and put all cards 
with the same number in one pile. 

A23456789 10JQK 

A23456789 10JQK 

A23456789 10JQK 

A23456789 10JQK 

With only a constant number of cards left in each pile, 
you can use insertion sort to order by suite and con- 
catenate everything together. 

If we could find the correct pile for each card in con- 
stant time, and each pile gets 0(1) cards, this algo- 
rithm takes 0{n) time. 



Bucket sort 

Suppose we are sorting n numbers from 1 to m, where 
we know the numbers are approximately uniformly dis- 
tributed. 

We can set up n buckets, each responsible for an in- 
terval of m/n numbers from 1 to m 



m/n m/n+1 2m/n 2m/n+l 3m/n 



Given an input number x, it belongs in bucket number 

\xn/rri] . 

If we use an array of buckets, each item gets mapped 
to the right bucket in 0(1) time. 

With uniformly distributed keys, the expected number 
of items per bucket is 1. Thus sorting each bucket 
takes 0(1) time! 

The total effort of bucketing, sorting buckets, and con- 
catenating the sorted buckets together is 0{n). 

What happened to our Q(nlgn) lower bound! 



We can use bucketsort effectively whenever we under- 
stand the distribution of the data. 

However, bad things happen when we assume the wrong 
distribution. 

Suppose in the previous example all the keys happened 
to be 1. After the bucketing phase, we have: 



x'^ 


X 


X 


x'Sc 


X 


X 
X 


^ 


X 


X 
X 



m/n m/n+1 2m/n 2m/n+l 3m/n 



We spent linear time distributing our items into buckets 
and learned nothing. Perhaps we could split the big 
bucket recursively, but it is not certain that we will 
ever win unless we understand the distribution. 

Problems like this are why we worry about the worst- 
case performance of algorithms! 

Such distribution techniques can be used on strings 
instead of just numbers. The buckets will correspond 
to letter ranges instead of just number ranges. 

The worst case "shouldn't" happen if we understand 
the distribution of our data. 



Real World Distributions 

Consider the distribution of names in a telephone book. 

• Will there be a lot of Skiena's? 

• Will there be a lot of Smith's? 

• Will there be a lot of Shifflett's? 

Either make sure you understand your data, or use a 
good worst-case or randomized algorithm! 



The S 
Char 



hi 
o 



flett's of 
ttesville 



For comparison, note that there are seven Shifflett's 
(of various spellings) in the 1000 page Manhattan tele- 
phone directory. 
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Parallel Bubblesort 

In order for me to give back your midterms, please form 
a line and sort yourselves in alphabetical order, from A 
to Z. 

There is traditionally a strong correlation between the 
midterm grades and the number of daily problems at- 
tempted: 



dai 
dai 
dai 
dai 
dai 
dai 
dai 
dai 
dai 
dai 



y: 0, sum: 134, count: 3, avg: 44.67 

y: 1, sum: 0, count: 2, avg: XXXXX 

y: 2, sum: 63, count: 1, avg: 63.00 

y: 3, sum: 194, count: 3, avg: 64.67 

y: 4, sum: 335, count: 5, avg: 67.00 

y: 5, sum: 489, count: 8, avg: 61.12 

y: 6, sum: 381, count: 6, avg: 63.50 

y: 7, sum: 432, count: 6, avg: 72.00 

y: 8, sum: 217, count: 3, avg: 72.33 

y: 9, sum: 293, count: 4, avg: 73.25 



Show that there is no sorting algorithm which sorts at 
least (1/2") X n! instances in 0{n) time. 



Think of the decision tree which can do this, 
is the shortest tree with (1/2") x n\ leaves? 



What 




^>lg(n!/2") = lg(n!)-lg(2") 

=^ ©(n Ig n) — n 
= ©(nign) 



IVIoral: there cannot be too many good cases for any 
sorting algorithnn! 



Show that the Q(nlgn) lower bound for sorting still 
holds with ternary comparisons. 




The maximum number of leaves in a tree of height h 
is 3^, 



IgsC'^O = 0(nlgn) 



So it goes for any constant base. 



Optimization Problems 

In the algorithms we have studied so far, correctness 
tended to be easier than efficiency. In optimization 
problems, we are interested in finding a thing which 
maximizes or minimizes some function. 

In designing algorithms for optimization problem - we 
must prove that the algorithm in fact gives the best 
possible solution. 

Greec/y algorithms, which makes the best local decision 
at each step, occasionally produce a global optimum - 
but you need a proof! 



Dynamic Programming 

Dynamic Programming is a technique for computing 
recurrence relations efficiently by sorting partial results. 



Computing Fibonacci Numbers 



-^n — -^n—1 \ -^n—^^ 



Fo = 0,Fi = l 



Implementing it as a recursive procedure is easy but 
slow! 

We keep calculating the same value over and over! 



F(6)=13 



F(5) 



F(4) 



F(4) 



F(3) 



F(3) F(2) 

FrS) V(2) F(2) F(l) ^^^\ ^^^^ F(l) F(0) 

/\ / \ / \ / \ 

F(2) F(l) p^^^ p^Q^ F(l) F(0) f'(l) f'(O) 

F(l) F(0) 



How slow is slow? 



Fn-i-i/Fn « (^ = (1 + V5)/2 « 1.61803 

Thus Fn w 1.6", and since our recursion tree has and 
1 as leaves, means we have w 1.6" calls! 



What about Dynamic 
Programming? 

We can calculate Fn in linear time by storing small 
values: 

Fo = 
Fi = l 
For z = 1 to n 

Fi = Fi-i + Fi-2 

Moral: we traded space for time. 

Dynamic programming is a technique for efficiently 
computing recurrences by storing partial results. 

Once you understand dynamic programming, it is usu- 
ally easier to reinvent certain algorithms than try to 
look them up! 

Dynamic programming is best understood by looking 
at a bunch of different examples. 

I have found dynamic programming to be one of the 
most useful algorithmic techniques in practice: 

• Morphing in Computer Graphics 

• Data Compression for High Density Bar Codes 

• Utilizing Grammatical Constraints for Telephone 
Keypads 



Multiplying a Sequence of 

Matrices 

Suppose we want to multiply a long sequence of ma- 
trices A X B X C X D 

Multiplying an X xY matrix by a Y x Z matrix (using 
the common algorithm) takes XxYxZ multiplications. 



2 3 

3 4 

4 5 



3 

4 



4 

5 



13 18 23 

18 25 32 
23 32 41 



We would like to avoid big intermediate matrices, and 
since matrix multiplication is associative, we can paren- 
thesise however we want. 

Matrix multiplication is not communitive, so we cannot 
permute the order of the matrices without changing the 
result. 



Example 



Consider Ax B xC x D, where A is 30 x 1, 5 is 1 x 40, 
C is 40 X 10, and D \s 10 x 25. 

There are three possible parenthesizations: 



{{AB)C)D = 30x1x40+30x40x10+30x10x25 = 20,700 



(ABXCD) = 30x1x40+40x10x25+30x40x25 = 41,200 



A{{BC)D) = 1x40x10 + 1x10x25+30x1x25 = 1400 

The order makes a big difference in real computation. 
How do we find the best order? 

Let M{i,j) be the minimum number of multiplications 
necessary to compute nfc=i^fc- 

The key observations are 

• The outermost parentheses partition the chain of 
matricies {i,j) at some k. 

• The optimal parenthesization order has optimal or- 
dering on either side of k. 



A recurrence for this is: 

M(iJ) = Mini<k<j-i[M(i, k) + M{k + l,i) + c?i_i4c?j] 
M(z,z) = 

If there are n matrices, there are n + 1 dimensions. 

A direct recursive implementation of this will be expo- 
nential, since there is a lot of duplicated work as in the 
Fibonacci recurrence. 

Divide-and-conquer is seems efficient because there is 
no overlap, but . . . 

There are only (^ j substrings between 1 and n. Thus 

it requires only ©(n^) space to store the optimal cost 
for each of them. 

We can represent all the possibilities in a triangle ma- 
trix. We can also store the value of k in another triangle 
matrix to reconstruct to order of the optimal parenthe- 
sisation. 

The diagonal moves up to the right as the computation 
progresses. On each element of the fcth diagonal \j — 

i\ =^ k. 

For the previous example: 

Procedure MatrixOrder 
for z = 1 to n do M[i,j] = 
for diagonal ^ 1 to n — 1 

for z ^ 1 to n — diagonal do 



j ^ i -\- diagonal 

M[iJ] = min^Z^[M[z, k] + M[k + IJ] + di-idkdj] 
faster(z, j) = k 
return [m(l,n)] 



Procedure ShowOrder(z, j) 
if (z = j) write (Ai) 
else 

k =factor(z, j) 

write "(" 

ShowOrder(z, k) 

write "*" 

ShowOrder (fc + 1, j) 

write ")" 



A dynamic programming 
solution has three components: 

1. Formulate the answer as a recurrence relation or 
recursive algorithm. 

2. Show that the number of different instances of 
your recurrence is bounded by a polynomial. 

3. Specify an order of evaluation for the recurrence 
so you always have what you need. 



Approximate String Matching 

A common task in text editing is string matching - 
finding all occurrences of a word in a text. 

Unfortunately, many words are mispelled. How can we 
search for the string closest to the pattern? 

Let p be a pattern string and T a text string over the 
same alphabet. 

A fc-approximate match between P and T is a substring 
of T with at most k differences. 

Differences may be: 

1. the corresponding characters may differ: KAT^ 
CAT 

2. P is missing a character from T: CAAT^ CAT 

3. T is missing a character from P: CT^ CAT 

Approximate Matching is important in genetics as well 
as spell checking. 



A 3- Approximate Match 

A match with one of each of three edit operations is: 

P = unescessaraly 

T = unnecessarily 

Finding such a matching seems like a hard problem 
because we must figure out where you add blanks, but 
we can solve it with dynamic programming. 

D[i,j] = the minimum number of differences between 
Pi,P2, • • • ,Pi and the segment of T ending at j. 

D[i,j] is the minimum of the three possible ways to 
extend smaller strings: 

1. If Pi = ti then D[i - IJ - 1] else D[i - l,j - 1] + 1 
(corresponding characters do or do not match) 

2. D[i — l,j] + 1 (extra character in text - we do not 
advance the pattern pointer). 

3. D[i,j — l]-\-l (character in pattern which is not in 
text). 

Once you accept the recurrence it is easy. 

To fill each cell, we need only consider three other cells, 
not 0{n) as in other examples. This means we need 
only store two rows of the table. The total time is 
0{mn). 



Boundary conditions for string 

matching 

What should the value of D[0,i] be, corresponding to 
the cost of nnatching the first i characters of the text 
with none of the pattern? 

It depends. Are we doing string matching in the text 
or substring matching? 

• If you want to match all of the pattern against all 
of the text, this meant that would have to delete 
the first i characters of the pattern, so D[0,i] = i 
to pay the cost of the deletions. 

• if we want to find the place in the text where the 
pattern occurs? We do not want to pay more of 
a cost if the pattern occurs far into the text than 
near the front, so it is important that starting cost 
be equal for all positions. In this case, D[0,i] = 
0, since we pay no cost for deleting the first i 
characters of the text. 

In both cases, D[i,0] = i, since we cannot excuse delet- 
ing the first i characters of the pattern without cost. 



What do we return? 

If we want the cost of comparing all of the pattern 
against all of the text, such as comparing the spelling 
of two words, all we are interested in is D[n,m]. 

But what if we want the cheapest match between the 
pattern anywhere in the text? Assuming the initial- 
ization for substring matching, we seek the cheapest 
matching of the full pattern ending anywhere in the 
text. This means the cost equals m\ni<i<rnD[n,i]. 

This only gives the cost of the optimal matching. The 
actual alignment - what got matched, substituted, and 
deleted - can be reconstructed from the pattern/text 
and table without an auxiliary storage, once we have 
identified the cell with the lowest cost. 



How much space do we need? 

Do we need to keep all 0{mn) cells, since if we evaluate 
the recurrence filling in the columns of the matrix from 
left to right, we will never need more than two columns 
of cells to do what we need. Thus 0{rn) space is 
sufficient to evaluate the recurrence without changing 
the time complexity at all. 

Unfortunately, because we won't have the full matrix 
we cannot reconstruct the alignment, as above. 

Saving space in dynamic programming is very impor- 
tant. Since memory on any computer is limited, Oinm) 
space is more of a bottleneck than 0{nm) time. 

Fortunately, there is a clever divide-and-conquer algo- 
rithm which computes the actual alignment in O^nm) 
time and 0{rn) space. 



Give an O(n^) algorithm to find the longest montoni- 
cally increasing sequence in a sequence of n numbers. 



Build an example first: (5,2,8,7,1,6,4) 

Ask yourself what would you like to know about the 
first n— 1 elennents to tell you the answer for the entire 
sequence? 

1. The length of the longest sequence in si, S2, . . . , s^-i 
(seems obvious) 

2. The length of the longest sequence s„ will extend! 
(not as obvious - this is the idea!) 

Let Si be the length of the longest sequence ending 
with the zth character: 



sequence 

Si 



5 2 8 7 3 
112 2 2 



16 4 
13 3 



How do we compute si? 

Si ^ nnaXo<j<i^5eq[j]<seq[i] *j ~I~ ■'■ 

SO = 

To find the longest sequence - we know it ends some- 
where, so Length = rnax^^;^ si 



The Principle of Optimality 

To use dynamic programming, the problem must ob- 
serve the principle of optimality, that whatever the ini- 
tial state is, remaining decisions must be optimal with 
regard the state following from the first decision. 

Combinatorial problems may have this property but 
may use too much memory/time to be efficient. 



Example: The Traveling Salesman 

Problem 

Let T{i;ji,J2,...,jk) be the cost of the optimal tour 
for z to 1 that goes thru each of the other cities once 

T(i; n,i2, " ",ji) = Mini<:m<kC[i, jm] + T(jm', jijJ2j - • • Jk) 



T(z,i) = C(z,i) + C(i,l) 

Here there can be any subset of ji,J2,. .. ,jk instead of 
any subinterval - hence exponential. 

Still, with other ideas (some type of pruning or best- 
first search) it can be effective for combinatorial search. 



When can you use Dynamic 
Programming? 

Dynamic programming computes recurrences efficiently 
by storing partial results. Thus dynamic programming 
can only be efficient when there are not too many par- 
tial results to compute! 

There are n! permutations of an n-element set - we 
cannot use dynamic programming to store the best so- 
lution for each subpermutation. There are 2" subsets 
of an n-element set - we cannot use dynamic program- 
ming to store the best solution for each. 

However, there are only n{n — l)/2 continguous sub- 
strings of a string, each described by a starting and 
ending point, so we can use it for string problems. 

There are only n{n— l)/2 possible subtrees of a binary 
search tree, each described by a maximum and mini- 
mum key, so we can use it for optimizing binary search 
trees. 

Dynamic programming works best on objects which 
are iineariy ordered and cannot be rearranged - char- 
acters in a string, matrices in a chain, points around 
the boundary of a polygon, the left-to- right order of 
leaves in a search tree. 

Whenever your objects are ordered in a left-to-right 
way, you should smell dynamic programming! 



Minimum Length Triangulation 

A triangulation of a polygon is a set of non-intersecting 
diagonals which partiions the polygon into diagonals. 





The length of a triangulation is the sum of the diagonal 
lengths. 



We seek to find the minimum length triangulation 
a convex polygon, or part thereof: 



For 




Once we identify the correct connecting vertex, the 
polygon is partitioned into two smaller pieces, both of 
which must be triangulated optimally! 



tli,i-\-l] 
tlhj] 



= 



mint[z,fc] -{-t[kj] + \ik\ + \kj\ 



Evaluation proceeds as in the matrix multiplication ex- 
ample - ( 9 ) values of t, each of which takes OU — i) 

time if we evaluate the sections in order of increasing 
size. 



.■^\ 



3*^^#-^5 



J-i = 2 
13,24,35,46,51,62 

J-i = 3 
14,25,36,41,52,63 

J-i = 4 

15,26,31,42,53,64 
Finish with 16 



What if there are points in the interior of the polygon? 



Dynamic Programming and 
High Density Bar Codes 

Symbol Technology has developed a new design for bar 
codes, PDF-417 that has a capacity of several hundred 
bytes. What is the best way to encode text for this 
design? 




M 



They developed a complicated mode-switching data 
compression scheme. 




Latch commands permanently put you in a different 
mode. Shift commands temporarily put you in a dif- 
ferent mode. 



Originally, Symbol used a greedy algorithm to encode 
a string, making local decisions only. We realized that 
for any prefix, you want an optimal encoding which 
might leave you in every possible mode. 

The Quick Brown Fox 



Alpha 
Lower 
Mixed 
Punct. 











X 











M[i,j] = nnin(M[z — 1, k]-\- the cost of encoding the zth 
character and ending up in node j). 

Our simple dynamic programming algorithm improved 
to capacity of PDF-417 by an average of 8%! 



Dynamic Programming and 

Morphing 

Morphing is the problem of creating a smootii series of 
intermediate images given a starting and ending image. 



The key problem is establishing a correspondence be- 
tween features in the two images. You want to morph 
an eye to an eye, not an ear to an ear. 

We can do this matching on a line-by-line basis: 





Object A's segments 


T = ^^ 

\ 
\ 


^^^^. ^ ^\ 


T = 0.5 \ 




T = l ^ 






Object B's segments 



This should sound like string matching, but with a dif- 
ferent set of operations: 

• Full run match: We may match run i on top to 
run j on bottom for a cost which is a function of 
the difference in the lengths of the two runs and 
their positions. 



Merging runs: We may match a string of consecu- 
tive runs on top to a run on bottom. The cost will 
be a function of the number of runs, their relative 
positions, and lengths. 



• Splitting runs: We may match a big run on top to 
a string of consecutive runs on the bottom. This 
is just the converse of the merge. Again, the cost 
will be a function of the number of runs, their 
relative positions, and lengths. 

This algorithm was incorported into a morphing sys- 
tem, with the following results: 



% "% \ 




Problem Solving Techniques 

Most important: make sure you understand exactly 
what the question is asking - if not, you have no hope 
of answer it!! 

Never be afraid to ask for another explanation of a 
problem until it is clear. 

Play around with the problem by constructing examples 
to get insight into it. 

Ask yourself questions. Does the first idea which comes 
into my head work? If not, why not? 

Am I using all information that I am given about the 
problem? 

Read Polya's book How to Solve it. 



The Euclidean traveling-salesman problem is the prob- 
lem of determining the shortest closed tour that con- 
nects a given set of n points in the plane. 

Bent ley suggested simplifying the problem by restrict- 
ing attention to bitonic tours, that is tours which start 
at the leftmost point, go strictly left to right to the 
rightmost point, and then go strictly right back to the 
starting point. 





non-bitonic 



bitonic 



Describe an O(n^) algorithm for finding the optimal 
bitonic tour. You may assume that no two points have 
the same x-coordinate. (Hint: scan left to right, main- 
taining optimal possibilities for the two parts of the 
tour.) 



Make sure you understand what a bitonic tour is, or 
else it is hopeless. 

First of all, play with the problenn. Why isn't it trivial? 



o 



o 



"Hey, I guess this tour 
can zig-zag a lot." 



O' 



o 







" Hey, I guess I can't tell an 
upper point from a lower point" 



"Hey, i guess that I can have 
an arbitrary number of upper 
or lower points in a row." 



Am I using all the information? 

Why will they let us assume that no two ic-coordi nates 
are the same? What does the hint mean? What hap- 
pens if I scan from left to right? 

If we scan from left to right, we get an open tour which 
uses all points to the left of our scan line. 




In the optimal tour, the fcth point is connected to ex- 
actly one point to the left of k. {k ^ n) Once I decide 
which point that is, say x. I need the optimal partial 
tour where the two endpoints are x and fc— 1, because 
if it isn't optimal I could come up with a better one. 
Hey, I have got a recurrence! And look, the two pa- 
rameters which describe my optimal tour are the two 
endpoints. 



Let c[fc,n] be the optimal cost partial tour where the 
two endpoints are k < n. 



c[k, n] < c[k, n — 1] + d[n, n — 1] (when k <. n 
c\n — 1, n] < c[fc, n — 1] + d[fc, n] 
c[0, 1] = d[0, 1] 



-1) 



N 



K 








d(0, 1) 




/^ 




1 




vy 




/ 




2 








\ 


/ 


3 








V 



Filling the entities in from N=l to N', k=l to N, means we 
always have what we need waiting for us. 



c[n— l,n] takes 0(n) to update, c[fc,n] A; < n — 1 takes 
0(1) to update. Total time is O(n^). 

But this doesn't quite give the tour, but just an open 
tour. We simply must figure where the last edge to n 
must go. 



n 



Tour cost = m'ln C[k,n] -\- dk 



n 



Divide and Conquer 

Divide and conquer was a successful military strategy 
long before it became an algorithm design paradigm. 
The wise general would attack so as to divide the en- 
emy army into two forces and then mop up one after 
the other. 

To use divide and conquer as an algorithm design tech- 
nique, we must divide the problem into two smaller 
subproblems, solve each of them recursively, and then 
meld the two partial solutions into one solution to the 
full problem. Whenever the merging takes less time 
than solving the two subproblems, we get an efficient 
algorithm. 

Mergesort is the classic example of a divide-and-conquer 
algorithm. It takes only linear time to merge two sorted 
lists of n/2 elements each of which was obtained in 
0(nlg n) time. 

Divide and conquer is a design technique with many 
important algorithms to its credit, including mergesort, 
the fast Fourier transform, and Strassen's matrix mul- 
tiplication algorithm. 



Fast Exponentiation 

Suppose that we need to compute the value of a" for 
some reasonably large n. Such problems occur in pri- 
mality testing for cryptography. 

The simplest algorithm performs n — 1 multiplications, 
by computing a x a x . . . x a. 

However, we can do better by observing that n = 
[n/2j + \n/2]. If n is even, then a" = (a"/2)2. if n 
is odd, then a" = a(aL"/2J)2. in either case, we have 
halved the size of our exponent at the cost of at most 
two multiplications, so O(lgn) multiplications suffice 
to compute the final value. 

function power(a, n) 

if (n = 0) return(l) 
X = power(a, [n/2j) 
if {n is even) then return(a?2) 
else return(a x x^) 



This simple algorithm illustrates an important principle 
of divide and conquer. It always pays to divide a job 
as evenly as possible. 



Twenty Questions 

In Twenty questions one player selects a word, and 
the other repeatedly asks true/false questions in an 
attempt to identify the word. If the word remains 
unidentified after 20 questions, the first party wins; 
otherwise, the second player wins. 

In fact, the second player always has a winning strategy, 
based on binary search. Given a printed dictionary, 
the player opens it in the middle, selects a word (say 
"move" ), and asks whether the unknown word is before 
"move" in alphabetical order. 



Since standard dictionaries contain 50,000 to 200,000 
words, we can be certain that the process will always 
terminate within twenty questions. 



Finding a Transition 

other interesting aigoritiims follow from sinnple vari- 
ants of binary search. 

Suppose we have an array A consisting of a run of O's, 
followed by an unbounded run of I's, and would like to 
identify the exact point of transition between them: 



0000000000000000000000011111111111 

Binary search on the array would provide the transition 
point in [Ig n] tests. 

Clearly there is no way to solve this problem any faster. 



One-Sided Binary Search 

Suppose that we want to search in a sorted array, but 
we do not know how large the array is. All we know is 
the starting point. 



{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .} 

How can we use binary search without both bound- 
aries? 

In the absence of such a bound, we can test repeatedly 
at larger intervals (A[l], A[2], A[4], A[8], A[16], ...) 
until we find a first nonzero value. Now we have a 
window containing the target and can proceed with 
binary search. 

This one-sided binary search finds the transition point 
p using at most 2[lgp] comparisons, regardless of how 
large the array actally is. 

One-sided binary search is most useful whenever we are 
looking for a key that probably lies close to our current 
position. 



Square and Other Roots 

The square root of n is the number r such that r^ = n. 
Square root computations are performed inside every 
pocket calculator - but how? 

Observe that the square root of n > 1 must be at least 
1 and at most n. Let / = 1 and r = n. Consider the 
midpoint of this interval, m = {I -\- r)/2. How does rm? 
compare to n? 

If n > rm?, then the square root must be greater than 
m, so the algorithm repeats with I = m. If n < m?, then 
the square root must be less than m, so the algorithm 
repeats with r = m. 

Either way, we have halved the interval with only one 
comparison. Therefore, after only Ign rounds we will 
have identified the square root to within ±1. 

This bisection method, as it is called in numerical anal- 
ysis, can also be applied to the more general problem 
of finding the roots of an equation. We say that a? is a 
root of the function / if f{x) = 0. 



Find the missing integer from to n using 0(n) "is 
bitjj] in A[i]" queries. 



Note - there are a total of nign bits, so we are not 
allowed to read the entire input! 

Also note, the problem is asking us to nnininnize the 
number of bits we read. We can spend as much time 
as we want doing other things provided we don't look 
at extra bits. 

How can we find the last bit of the missing integer? 

Ask all the n integers what their last bit is and see 
whether or 1 is the bit which occurs less often than 
it is supposed to. That is the last bit of the missing 
integer! 

How can we determine the second-to-last bit? 

Ask the « n/2 numbers which ended with the correct 
last bit! By analyzing the bit patterns of the numbers 
from to n which end with this bit. 

By recurring on the remaining candidate numbers, we 
get the answer in T{n) = T{n/2) -\- n = 0{n), by the 
Master Theorem. 



Graphs 



A graph G consists of a set of vertices V together with 
a set E of vertex pairs or edges. 

Graphs are important because any binary relation is a 
graph, so graphs can be used to represent essentially 
any relationship. 

Example: A network of roads, with cities as vertices 
and roads between cities as edges. 



vertices - cities 



edges - roads 



Green Port 
Q Q Orient Point 




Q Shelter Island 



Sag Harbor 



Example: An electronic circuit, with junctions as ver- 
tices as components as edges. 



vertices: junctions 
edges: components 



To understand many problems, we must think of them 
in terms of graphs! 



The Friendship Graph 

Consider a graph where the vertices are people, and 
there is an edge between two people if and only if they 
are friends. 




o 



This graph is well-defined on any set of people: SUNY 
SB, New York, or the world. 

What questions might we ask about the friendship 
graph? 

• If I am your friend, does that mean you are 
my friend? 

A graph is undirected \f {x,y) implies (y, a;). Other- 
wise the graph is directed. The "heard-of" graph 
is directed since countless famous people have never 
heard of me! The "had-sex-with" graph is presum- 
ably undirected, since it requires a partner. 

• Am I my own friend? 

An edge of the form {x,x) is said to be a loop. 
If X is y's friend several times over, that could be 
modeled using multiedges, multiple edges between 
the same pair of vertices. A graph is said to be 
simple if it contains no loops and multiple edges. 



Am I linked by some chain of friends to the 
President? 

A path is a sequence of edges connecting two ver- 
tices. Since Mel Brooks is my father's-sister's- 
husband's cousin, there is a path between me and 
him! 

o — o o — o — o 

Steve Dad Aunt Eve Uncle Lenny Cousin Mel 



• How close is my link to the President? 

If I were trying to impress you with how tight I 
am with Mel Brooks, I would be much better off 
saying that Uncle Lenny knows him than to go into 
the details of how connected I am to Uncle Lenny. 
Thus we are often interested in the shortest path 
between two nodes. 

• Is there a path of friends between any two 
people? 

A graph is connected if there is a path between 
any two vertices. A directed graph is strongly con- 
nected if there is a directed path between any two 
vertices. 

• Who has the most friends? 

The degree of a vertex is the number of edges 
adjacent to it. 



What is the largest clique? 

A social clique is a group of mutual friends who all 
hang around together. A graph theoretic clique is 
a complete subgraph, where each vertex pair has 
an edge between them. Cliques are the densest 
possible subgraphs. Within the friendship graph, 
we would expect that large cliques correspond to 
workplaces, neighborhoods, religious organizations, 
schools, and the like. 

How long will it take for my gossip to get back 
to me? 

A cycle is a path where the last vertex is adjacent 
to the first. A cycle in which no vertex repeats 
(such as 1-2-3-1 verus 1-2-3-2-1) is said to be 
simple. The shortest cycle in the graph defines its 
girth, while a simple cycle which passes through 
each vertex is said to be a Hamiltonian cycle. 



Data Structures for Graphs 

There are two main data structures used to represent 
graphs. 



Adjacency Matrices 



An adjacency matrix \s an nxn matrix, where M[i,j] 
iff there is no edge from vertex i to vertex j 




10 1 
10 111 
10 10 
110 1 
110 10 



It takes 0(1) time to test if (i,j) is in a graph repre- 
sented by an adjacency matrix. 

Can we save space if (1) the graph is undirected? (2) 
if the graph is sparse? 



Adjacency Lists 



An adjacency list consists of a TV x 1 array of pointers, 
where the zth element points to a linked list of the 
edges incident on vertex /. 




{I>->EJ- 




To test if edge {i,j) is in the graph, we search the zth 
list for j, which takes 0{di), where di is the degree of 
the zth vertex. 



Note that di can be much less than n when the graph 
is sparse. If necessary, the two copies of each edge can 
be linked by a pointer to facilitate deletions. 



Tradeoffs Between Ad 
Lists and Adjacency M 



acency 
atrices 



Comparison 


Winner 


Faster to test if (a?, y) exists? 


matrices 


Faster to find vertex degree? 


lists 


Less memory on small graphs? 


lists (m + n) vs. (n^) 


Less memory on big graphs? 


matrices (small win) 


Edge insertion or deletion? 


matrices 0(1) 


Faster to traverse the graph? 


lists m + n vs. n^ 


Better for most problems? 


lists 



Both representations are very useful and have different 
properties, although adjacency lists are probably better 
for most problems. 



Traversing a Graph 

One of the most fundamental graph problems is to 
traverse every edge and vertex in a graph. Applications 
include: 



• Printing out the contents of each edge and vertex. 

• Counting the number of edges. 

• Identifying connected components of a graph. 

For efHciency, we must make sure we visit each edge 
at most twice. 

For correctness, we must do the traversal in a system- 
atic way so that we don't miss anything. 

Since a maze is just a graph, such an algorithm must be 
powerful enough to enable us to get out of an arbitrary 
maze. 



Marking Vertices 

The idea in graph traversal is that we must mark each 
vertex when we first visit it, and keep track of what 
have not yet completely explored. 

For each vertex, we can maintain two flags: 

• discovered - have we ever encountered this vertex 
before? 

• completely-explored - have we finished exploring 
this vertex yet? 

We must also maintain a structure containing all the 
vertices we have discovered but not yet completely ex- 
plored. 

Initially, only a single start vertex is considered to be 
discovered. 

To completely explore a vertex, we look at each edge 
going out of it. For each edge which goes to an undis- 
covered vertex, we mark it discovered and add it to the 
list of work to do. 

Note that regardless of what order we fetch the next 
vertex to explore, each edge is considered exactly twice, 
when each of its endpoints are explored. 



Correctness of Graph Traversal 

Every edge and vertex in the connected component is 
eventually visited. 

Suppose not, ie. there exists a vertex which was un- 
visited whose neighbor was visited. This neighbor will 
eventually be explored so we would visit it: 




The square of a directed graph G = {V, E) is the graph 
G^ = {V,E^) such that {u,w) G E^ iff for some v eV, 
both {u,v) G E and {v,w) G E; ie. there is a path of 
exactly two edges. 

Give efficient algorithms for both adjacency lists and 
matricies. 



Given an adjacency matrix, we can check in constant 
time whether a given edge exists. To discover whether 
there is an edge {u,w) G G^, for each possible interme- 
diate vertex v we can check whether {u,v) and {v,w) 
exist in 0(1). 

Since there are at most n intermediate vertices to 
check, and n^ pairs of vertices to ask about, this takes 
0{n^) time. 

With adjacency lists, we have a list of all the edges in 
the graph. For a given edge {u,v), we can run through 
all the edges from v in 0(n) time, and fill the results 
into an adjacency matrix of G^, which is initially empty. 

It takes 0{mn) to construct the edges, and Oi'n?) to 
initialize and read the adjacency matrix, a total of 
0{{n + m)n). Since n < m unless the graph is dis- 
connected, this is usually simplified to 0{mn), and is 
faster than the previous algorithm on sparse graphs. 

Why is it called the square of a graph? Because the 
square of the adjacency matrix is the adjacency ma- 
trix of the square! This provides a theoretically faster 
algorithm. 



Traversal Orders 

The order we explore the vertices depends upon what 
kind of data structure is used: 

• Queue - by storing the vertices in a first-in, first 
out (FIFO) queue, we explore the oldest unex- 
plored vertices first. Thus our explorations radiate 
out slowly from the starting vertex, defining a so- 
called breadth-first search. 

• Stack - by storing the vertices in a last-in, first- 
out (LIFO) stack, we explore the vertices by lurch- 
ing along a path, constantly visiting a new neigh- 
bor if one is available, and backing up only if we 
are surrounded by previously discovered vertices. 
Thus our explorations quickly wander away from 
our starting point, defining a so-called depth-first 
search. 

The three possible colors of each node reflect if it is 
unvisited (white), visited but unexplored (grey) or com- 
pletely explored (black). 



Breadth-First Search 

BFS(G,s) 

for each vertex u 6 V[G] — {s} do 
color[u] = white 

d[u] = oo, ie. the distance from s 
p[u] = NIL, ie. the parent in the BFS tree 

color[u] = grey 

d[s] = 

p[s] = NIL 

Q = {s} 

while Q ^z^ do 

u = head[Q] 

for each v G ^cj/[u] do 

if color[v] ^ white then 
coZor['u] = gray 
d[v] = d[u] + 1 
p[v] = u 
enqueue[Q,v] 
dequeue[Q] 
color[u] ^ black 



Depth-First Search 

DFS has a neat recursive implementation which elimi- 
nates the need to explicitly use a stack. 

Discovery and final times are sometimes a convenience 
to maintain. 

DFS(G) 

for each vertex u G V[G] do 

coZor[it] = white 

parent[u\ = nil 
time ^ 
for each vertex u G V[G] do 

if color[u] = white then DFS-VISIT[u] 

Initialize each vertex in the main routine, then do a 

search from each connected component. BFS must 

also start from a vertex in each component to com- 
pletely visit the graph. 

DFS-VISIT[u] 

color[u] = grey (*u had been white/undiscovered*) 

discover[u\ = time 

time ^= time + 1 

for each v G AdjM do 

if co/or['u] = white then 
parent[v] ^ u 
DFS-VISIT(v) 
color[u] = black (*now finished with w*) 

finish[u] = time 
time ^= time + 1 



BFS Trees 

If BFS is performed on a connected, undirected graph, 
a tree is defined by the edges involved with the discov- 
ery of new nodes: 




This tree defines a stiortest patti from ttie root to every 
ottier node in the tree. 

The proof is by induction on the length of the shortest 
path from the root: 

• Length = 1 First step of BFS explores all neigh- 
bors of the root. In an unweighted graph one edge 
must be the shortest path to any node. 

• Length = s Assume the BFS tree has the shortest 
paths up to length 5 — 1. Any node at a distance of 
s will first be discovered by expanding a distance 
s — 1 node. 



The key idea about DFS 

A depth-first search of a graph organizes the edges of 
the graph in a precise way. 

In a DFS of an undirected graph, we assign a direction 
to each edge, from the vertex which discover it: 





In a DFS of a directed graph, every edge is either a 
tree edge or a black edge. 

In a DFS of a directed graph, no cross edge goes to a 
higher numbered or rightward vertex. Thus, no edge 
from 4 to 5 is possible: 




Edge Classification for DPS 

What about the other edges in the graph? Where can 
they go on a search? 

Every edge is either: 



1. A Tree Edge 




O 



3. A Forward Edge 
to a decendant 



O 



P 



o 



2. A Back Edge 
to an ancestor 




4. A Cross Edge 
to a different node 



O 




On any particular DFS or BFS of a directed or undi- 
rected graph, each edge gets classified as one of the 
above. 



DFS Trees 



The reason DFS is so important is that it defines a 
very nice ordering to the edges of the graph. 

In a DFS of an undirected graph, every edge is eitlier 
a tree edge or a back edge. 

Why? Suppose we have a forward edge. We would 
have encountered (4,1) when expanding 4, so this is a 
back edge. 



1 o 




Suppose we have a cross-edge 




5 When expanding 2, we would discover 
5, so the tree would look hke: 




O 6 



Paths in search trees 



Where is the shortest path in a DFS? 




It could use multiple 
back and tree edges, 
where BFS only used 
tree edges. 



It could use multiple back and tree edges, where BFS 
only uses tree edges. 

DFS gives a better approximation of the longest path 
than BFS. 




The BFS tree can have height 1, 
independant of the length of the 
longest path. 




The DFS must always have height 
>= log P, where P is the length of 
the longest path. 



Give an efficient algorithm to test if a graph is bipar- 
tite. 



Bipartite means the vertices can be colored red or black 
such that no edge links vertices of the same color. 




Suppose we color a vertex red - what color must its 
neighbors be? black! 

We can augment either BFS or DFS when we first dis- 
cover a new vertex, color it opposited its parents, and 
for each other edge, check it doesn't link two vertices 
of the same color. The first vertex in any connected 
component can be red or black! 

Bipartite graphs arise in many situations, and special 
algorithms are often available for them. What is the 
interpretation of a bipartite "had-sex-with" graph? 

How would you break people into two groups such 
that no group contains a pair of people who hate each 
other? 



Give an 0(n) algorithm to test whether an undirected 
graph contains a cycle. 



If you do a DFS, you have a cycle iff you have a back 
edge. This gives an 0(n-\-m) algorithm. But where 
does the m go? If the graph contains more than n— 1 
edges, it must contain a cycle! Thus we never need 
look at more than n edges if we are given an adjacency 
list representation! 



Topological Sorting 

A directed, acyclic graph is a directed graph with no 
directed cycles. 





A topological sort of a graph is an ordering on the 
vertices so that all edges go from left to right. 

Only a DAG can have a topological sort. 




=^1 



~~ii^ ir 



Any DAG has (at least one) topological sort. 



Applications of Topological 

Sorting 

Topological sorting is often useful in scheduling jobs 
in their proper sequence. In general, we can use it to 
order things given constraints, such as a set of left- 
right constraints on the positions of objects. 

Example: Dressing schedule from CLR. 

Example: Identifying errors in DNA fragment assembly. 



Certain fragments are constrained to be to the left or 
right of other fragments, unless there are errors. 

ABRACADABRA 

A B R A C 



A B R A C 

ACADA RACAD 

ADABR ACADA 

DABRA ADABR 

RACAD DABRA 



Solution - build a DAG representing all the left-right 
constraints. Any topological sort of this DAG is a con- 
sistant ordering. If there are cycles, there must be 
errors. 

A DFS can test if a graph is a DAG (it is iff there are 
no back edges - forward edges are allowed for DFS on 
directed graph). 



Algorithm 



Theorem: Arranging vertices in decreasing order of 
DFS finishing time gives a topological sort of a DAG. 

Proof: Consider any directed edge u,v, when we en- 
counter it during the exploration of vertex u: 

• If i; is white - we then start a DFS of v before we 
continue with u. 

• If -u is grey - then u,v \s 3 back edge, which cannot 
happen in a DAG. 

• If -u is black - we have already finished with v, so 
/M < flu]. 

Thus we can do topological sorting in 0{n-\-m) time. 



Articulation Vertices 

Suppose you are a terrorist, seeking to disrupt the tele- 
phone network. Which station do you blow up? 




An articulation vertex is a vertex of a connected graph 
whose deletion disconnects the graph. 

Clearly connectivity is an innportant concern in the de- 
sign of any network. 

Articulation vertices can be found in 0{n{m + n)) - 
just delete each vertex to do a DFS on the remaining 
graph to see if it is connected. 



A Faster 0(n'\-m) DFS 
Algorithm 

Theorem: In a DFS tree, a vertex v (other than the 
root) is an articulation vertex iff v is not a leaf and 
some subtree of v has no back edge incident until a 
proper ancestor of v. 

The root is a special case since 
it has no ancestors. 

X is an articulation vertex since 
the right subtree does not have 
a back edge to a proper ancestor. 




Leaves cannot be 
articulation vertices 



Proof: (1) V is an articulation vertex -^ v cannot be a 
leaf. 

Why? Deleting v must seperate a pair of vertices x 
and y. Because of the other tree edges, this cannot 
happen unless y is a decendant of v. 




X 



V separating x,y implies there is no back edge in the 
subtree of y to a proper ancestor of v. 



(2) Conditions ^ -u is a non-root articulation vertex, v 
separates any ancestor of v from any decendant in the 
appropriate subtree. 

Actually implementing this test in 0{n-\-m) is tricky - 
but believable once you accept this theorem. 



strongly Connected 
Components 

A directed graph is strongly connected iff there is a 
directed path between any two vertices. 

The strongly connected components of a graph is a 
partition of the vertices into subsets (maximal) such 
that each subset is strongly connected. 




Observe that no vertex can be in two maximal compo- 
nents, so it is a partition. 




There is an amazingly elegant, linear time algorithm to 
find the strongly connected components of a directed 
graph, using DFS. 

• Call DFS(cr) to compute finishing times for each 
vertex. 



• Compute the transpose graph G^ (reverse all edges 
in G) 

• Call DFS(G^), but order the vertices in decreasing 
order of finish time. 

• The vertices of each DFS tree in the forest of 
DFS(G^) is a strongly connected component. 

This algorithm takes 0{n-\-m), but why does it com- 
pute strongly connected components? 

Lemma: If two vertices are in the same strong com- 
ponent, no path between them ever leaves the compo- 
nent. 




X must also be in 
the strong component! 



Lemma: In any DFS forest, all vertices in the same 
strongly connected component are in the same tree. 

Proof: Consider the first vertex v in the component to 
be discovered. Everything in the component is reach- 
able from it, so we will traverse it before finishing with 

V. 



What does DFS(G^, v) Do? 

It tells you what vertices have directed paths to v, 
while DFS{(T,v) tells what vertices have directed paths 
from V. But why must any vertex in the search tree of 
DFS(G^, v) also have a path from u? 





GT 



Because there is no edge from any previous DFS tree 
into the last tree!! Because we ordered the vertices 
by decreasing order of finish time, we can peel off the 
strongly connected components from right to left just 
be doing a DFS(G^). 



Example of Strong 
Components Algorithm 




9, 10, 11, 12 can reach 9, oldest remaining finished is 5. 
5,6,8 can reach 5, oldest remaining is 7. 
7 can reach 7, oldest remaining is 1. 
1,2,3 can reach 1, oldest remaining is 4. 
4 can reach 4. 




DFG(G) 9 is the last vertex to finish 



Show that you can topologlcally sort In 0{n + m) by 
repeatedly deleting vertices of degree 0. 



The correctness of this algorithm follows since in a 
DAG there nnust always be a vertex of indegree 0, and 
such a vertex can be first in topological sort. Suppose 
each vertex is initialized with its indegree (do DFS on 
G to get this). Deleting a vertex takes 0(degree v). 
Reduce the indegree of each efficient vertex - and keep 
a list of degree-0 vertices to delete next. 

Time: ^^^^ 0{deg{vi)) = 0{n + m) 



Minimum Spanning Trees 

A tree is a connected graph with no cycles. A spanning 
tree is a subgraph of G which has the same set of 
vertices of G and is a tree. 

A minimum spanning tree of a weighted graph G is 
the spanning tree of G whose edges sum to minimum 
weight. 

There can be more than one minimum spanning tree 
in a graph -^ consider a graph with identical weight 
edges. 

The minimum spanning tree problem has a long history 
- the first algorithm dates back at least to 1926!. 

Minimum spanning tree is always taught in algorithm 
courses since (1) it arises in many applications, (2) it is 
an important example where greedy algorithms always 
give the optimal answer, and (3) Clever data structures 
are necessary to make it work. 

In greedy algorithms, we make the decision of what 
next to do by selecting the best local option from all 
available choices - without regard to the global struc- 
ture. 



Applications of Minimum 
Spanning Trees 

Minimum spanning trees are useful in constructing net- 
works, by describing the way to connect a set of sites 
using the smallest total amount of wire. Much of the 
work on minimum spanning (and related Steiner) trees 
has been conducted by the phone company. 

Minimum spanning trees provide a reasonable way for 
clustering points in space into natural groups. 

When the cities are points in the Euclidean plane, the 
minimum spanning tree provides a good heuristic for 
traveling salesman problems. The optimum traveling 
salesman tour is at most twice the length of the mini- 
mum spanning tree. 



The Option Traveling System tour is at most twice 
tlie lengtli of the minimum spanning tree. 

Note: There can be more than one minimum spanning 
tree considered as a group with identical weight 
edges. 



Prim's Algorithm 

If G is connected, every vertex will appear in the mini- 
mum spanning tree. If not, we can talk about a mini- 
mum spanning forest. 

Prim's algorithm starts from one vertex and grows the 
rest of the tree an edge at a time. 

As a greedy algorithm, which edge should we pick? 
The cheapest edge with which can grow the tree by 
one vertex without creating a cycle. 

During execution we will label each vertex as either in 
the tree, fringe - meaning there exists an edge from 
a tree vertex, or unseen - meaning the vertex is more 
than one edge away. 

Select an arbitrary vertex to start. 

While (there are fringe vertices) 

select minimum weight edge between tree and fringe 
add the selected edge and vertex to the tree 

Clearly this creates a spanning tree, since no cycle can 
be introduced via edges between tree and fringe ver- 
tices, but is it minimum? 



Why is Prim's algoritiim 

correct? 

Don't be scared by the proof - the reason is really quite 
basic: 

Theorenn: Let G be a connected, weighted graph and 
let ^' c ^ be a subset of the edges in a MST T = 
{V,Et). Let V be the vertices incident with edges in 
E'. If (x,y) is an edge of minimum weight such that 
X £V' and y is not in V , then E' \J {x,y} is a subset of 
a minimum spanning tree. 

Proof: If the edge is in T, this is trivial. 

Suppose (a?,y) is not in T Then there must be a path 
in T from x to y since T is connected. If {v^w) is the 
first edge on this path with one edge in V' , if we delete 
it and replace it with {x^y) we get a spanning tree. 

This tree must have smaller weight than T, since W{v^ w) > 
W{x,y). Thus T could not have been the MST. 




Thus we cannot go wrong 
with the greedy strategy the 
way we could with the 
traveling salesman. 



prim's Algorithm is correct! 



Prim's Algorithm is correct! 

Thus we cannot go wrong with the greedy strategy the 
way we could with the traveling salesman problem. 



But how fast is Prim's? 

That depends on what data structures are used. In 
the simplest implementation, we can simply mark each 
vertex as tree and non-tree and search always from 
scratch: 

Select an arbitrary vertex to start. 

While (there are non-tree vertices) 

select minimum weight edge between tree and fringe 
add the selected edge and vertex to the tree 

This can be done in 0{nm) time, by doing a DFS or 
BFS to loop through all edges, with a constant time 
test per edge, and a total of n iterations. 

Can we do faster? If so, we need to be able to identify 
fringe vertices and the minimum cost edge associated 
with it, fast. We will augment an adjacency list with 
fields maintaining fringe information. 

Vertex: 
fringelink pointer to next vertex in fringe list. 
fringe weight cheapest edge linking v to /. 
parent other vertex with v having fringeweight. 
status intree, fringe, unseen. 
adjacency iist the list of edges. 



Finding the minimum weight fringe-edge takes 0(n) 
time -just bump through fringe list. 

After adding a vertex to the tree, running through its 
adjacency list to update the cost of adding fringe ver- 
tices (there may be a cheaper way through the new 
vertex) can be done in 0(n) time. 

Total time is 0(11?). 



Kruskal's Algorithm 

Since an easy lower bound argument shows that every 
edge must be looked at to find the minimum spanning 
tree, and the number of edges m = Oi'n?), Prim's al- 
gorithm is optimal in the worst case. Is that all she 
wrote? 

The complexity of Prim's algorithm is independent of 
the number of edges. Can we do better with sparse 
graphs? Yes! 

Kruskal's algorithm is also greedy. It repeatedly adds 
the smallest edge to the spanning tree that does not 
create a cycle. Obviously, this gives a spanning tree, 
but is it minimal? 



Why is Kruskal's algorithm 

correct? 

Theorem: Let G be a weighted graph and let E' c E. 
If E' is contained in a MST T and e is the smallest edge 
in E — E' which does not create a cycle, E' UeCT. 

Proof: As before, suppose e is not in T. Adding e to T 
makes a cycle. Deleting another edge from this cycle 
leaves a connected graph, and if it is one from E — E' 
the cost of this tree goes down. Since such an edge 
exists, T could not be a MST. 




How fast is Kruskal's 
algorithm? 

What is the simplest implementation? 

• Sort the m edges in O(mlgm) time. 

• For each edge in order, test whether it creates 
a cycle the forest we have thus far built - if so 
discard, else add to forest. With a BFS/DFS, this 
can be done in 0{n) time (since the tree has at 
most n edges). 

The total time is 0(mn), but can we do better? 

Kruskal's algorithm builds up connected components. 
Any edge where both vertices are in the same con- 
nected component create a cycle. Thus if we can 
maintain which vertices are in which component fast, 
we do not have test for cycles! 

Put the edges in a heap 
count = 

while {count < n — 1) do 
get next edge {v,w) 
if (component (v) 7^ component(w)) 
add to T 
component (v)=component(w) 

If we can test components in O(logn), we can find the 
MST in O(mlogm)! 

Question: Is 0(m log n) better than 0(m log m)? 



Union-Find Programs 

Our analysis that Kruskal's MST algorithm is 0(mlog m) 
requires a fast way to test whether an edge links two 
vertices in the same connected component. 

Thus we need a data structure for maintaining sets 
which can test if two elements are in the same and 
merge two sets together. These can be implemented 
by UNION and FIND operations: 

XO t>2 ^^ t> o 

t = Find(si) 
u = Find(sj) 
Return (Is t = u?) 



Make si = sj 
t = d{si) 
u = d{sj) 
Union(t, w) 



Find returns the name of the set and Union sets the 
members of t to have the same name as u. 

We are interested in minimizing the time it takes to 
execute any sequence of unions and finds. 

A simple implementation is to represent each set as a 
tree, with pointers from a node to its parent. Each 
element is contained in a node, and the name of the 
set is the key at the root: 






In the worst case, these structures can be very unbal- 
anced: 

For z = 1 to n/2 do 

UNION(i,i+l) 
For z = 1 to n/2 do 

FIND(l) 



We want the linnit the height of our trees which are 
effected by UNIONS. When we union, we can make 
the tree with fewer nodes the child. 

Since the number of nodes is related to the height, 
the height of the final tree will increase only if both 
subtrees are of equal height! 

Lemma: If Union{t,v) attaches the root of -u as a sub- 
tree of t iff the number of nodes in t is greater than or 
equal to the number in v, after any sequence of unions, 
any tree with h/4 nodes has height at most [Ig/iJ. 

Proof: By induction on the number of nodes k, k = 1 
has height 0. 

Assume true to A; — 1 nodes. Let di be the height of 
the tree U 



dl 




d2 



k= kl+ k2 nodes 
d is the height 



If (dl > d2) then d = di < [log fcij < [Igiki + ^2)] 
[log fcj 

If (dl < c/2), then ki> k2. 

d = d2-\-l< [log fcsj + 1 = [log 2A;2j < [logC^i + ^2)] 
log fc 



Can we do better? 

We can do unions and finds in O(logn), good enough 
for Kruskal's algorithm. But can we do better? 

The ideal Union-Find tree has depth 1: 




N-l leaves 



On a find, if we are going down a path anyway, why 
not change the pointers to point to the root? 




FIND(4) 




This path compression will let us do better than 0(n log n) 
for n union-finds. 

0{n)? Not quite . . . Difficult analysis shows that it 
takes 0{n(x{n)) time, where a(n) is the inverse Acker- 
man function and a(number of atoms in the universe)= 
5. 



Describe an efficent algorithm that, given an undi- 
rected graph G, determines a spanning tree G whose 
largest edge weight is minimum over all spanning trees 
ofG. 



First, make sure you understand the question 





Lower maximum edge weight Lower total weight 



"Hey, doesn't Kruskal's algorithm do something like 
this." 

Certainly! Since Krushal's algorithm considers the edges 
in order of increasing weight, and stops the moment 
these edges form a connected graph, the tree it gives 
must minimize the edge weight. 

"Hey, but then why doesn't Prim's algorithm also work?" 

It gives the same thing as Kruskal's algorithm, so it 
must be true that any minimum spanning tree mini- 
mizes the maximum edge weight! 

Proof: Give me a MST and consider the largest edge 
weight. 





Deleting it disconnects the MST. If there was a lower 
edge connects the two subtrees, I didn't have a MST! 



Shortest Paths 

Finding the shortest path between two nodes in a graph 
arises in many different applications: 

• Transportation problems - finding the cheapest 
way to travel between two locations. 

• Motion planning - what is the most natural way 
for a cartoon character to move about a simulated 
environment. 

• Communications problems - how look will it take 
for a message to get between two places? Which 
two locations are furthest apart, ie. what is the 
diameter of the network. 



Shortest Paths and Sentence 
Disambiguation 

In our work on reconstructing text typed on an (over- 
loaded) telephone keypad, we had to select which of 
many possible interpretations was most likely. 



INPUT 
Blank Recognition 



Candidate Construction 



...#4483*63*2*7464#... 



Token Token Token Token 



"4483" 
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"4483" 


_J~^ 


"63" 
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_J~^ 


"7464" 
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give 




of 








ping 


hivB 


me 




ring 
















sing 



Sentence Disambiguating 



™ -^ ^-i' ring 



GIVE ME A RING. 



We constructed a graph where the vertices were the 
possible words/positions inithe sentence, with an edge 
between possible neighboring words. 



Codec 



Codec 2 



Codec, 



P(Wi'/#) 




p(w;/#) 



The weight of each edge is a function of the probability 
that these two words will be next to each other in a 
sentence, 'hive me' would be less than 'give me', for 
example. 

The final system worked extremely well - identifying 
over 99% of characters correctly based on grammatical 
and statistical constraints. 



Dynamic programming (the Viterbi algorithm) can be 
used on the sentences to obtain the same results, by 
finding the shortest paths in the underlying DAG. 



Finding Shortest Paths 

In an unweighted graph, the cost of a path is just the 
number of edges on the shortest path, which can be 
found in 0{n-\-m) time via breadth-first search. 

In a weighted graph, the weight of a path between two 
vertices is the sum of the weights of the edges on a 
path. 

BFS will not work on weighted graphs because some- 
times visiting more edges can lead to shorter distance, 
ie. 1 + 1 + 1 + 1 + 1 + 1 + 1<10. 

Note that there can be an exponential number of short- 
est paths between two nodes - so we cannot report all 
shortest paths efficiently. 

Note that negative cost cycles render the problem of 
finding the shortest path meaningless, since you can 
always loop around the negative cost cycle more to 
reduce the cost of the path. 

Thus in our discussions, we will assume that all edge 
weights are positive. Other algorithms deal correctly 
with negative cost edges. 

Minimum spanning trees are uneffected by negative 
cost edges. 



Dijkstra's Algorithm 

We can use Dijkstra's algorithm to find the shortest 
path between any two vertices s and t in G. 

The principle behind Dijkstra's algorithm is that if s, . . . ,cc, . . . ,t 
is the shortest path from s to t, then s, .. .,x had better 
be the shortest path from s to x. 

This suggests a dynamic programming-like strategy, 
where we store the distance from s to all nearby nodes, 
and use them to find the shortest path to more distant 
nodes. 

The shortest path from s to s, d{s,s) = 0. If all edge 
weights are positive, the smallest edge incident to s, 
say {s,x), defines d{s,x). 

We can use an array to store the length of the shortest 
path to each node. Initialize each to oo to start. 

Soon as we establish the shortest path from s to a new 
node X, we go through each of its incident edges to see 
if there is a better way from s to other nodes thru x. 



known ^ {s} 

for z ^ 1 to n, c?zst[z] = oo 

for each edge {s,v), dist[v] = d(s,v) 

last=s 

while (last 7^ t) 

select V such that dist{v) = n}\ n unknown dist{i) 

for each {v,x), dist[x] = rr\\r\{dist[x],dist[v] -\- w{v,x)) 

last='u 

known = known U {v} 



Complexity -^ O(n^) if we use adjacency lists and a 
Boolean array to mark what is known. 

This is essentiaiiy the same as Prim's aigorithm. 

An 0(m Ign) implementation of Dijkstra's algorithm 
would be faster for sparse graphs, and comes from us- 
ing a heap of the vertices (ordered by distance), and 
updating the distance to each vertex (if necessary) in 
O(lgn) time for each edge out from freshly known ver- 
tices. 

Even better, 0(nlgn-|-m) follows from using Fibonacci 
heaps, since they permit one to do a decrease-key op- 
eration in 0(1) amortized time. 



Give two more shortest path trees for the following 
graph: 




Run through Dijkstra's algorithm, and see where there 
are ties which can be arbitrarily selected. 

There are two choices for how to get to the third vertex 
X, both of which cost 5. 

There are two choices for how to get to vertex v, both 
of which cost 9. 



All-Pairs Shortest Path 

Notice that finding the shortest path between a pair 
of vertices {s,t) in worst case requires first finding the 
shortest path from s to all other vertices in the graph. 



Many applications, such as finding the center or di- 
ameter of a graph, require finding the shortest path 
between all pairs of vertices. 

We can run Dijkstra's algorithm n times (once from 
each possible start vertex) to solve all-pairs shortest 
path problem in 0{n^). Can we do better? 

Improving the complexity is an open question but there 
is a st7per-s//c/c dynamic programming algorithm which 
also runs in 0{n^). 



Dynamic Programming and 
Shortest Patiis 

The four-step approach to dynamic programming is: 

1. Characterize the structure of an optimal solution. 

2. Recursively define the value of an optimal solution. 

3. Compute this recurrence in a bottom-up fashion. 

4. Extract the optimal solution from computed infor- 
mation. 

From the adjacency matrix, we can construct the fol- 
lowing matrix: 

D[i,j] = oo, if z 7^ j and {vi,vj) is not in E 

D[iJ] = w(ij), if (vi,vj) e E 
D[i,3] = 0, ifi = i 

This tells us the shortest path going through no inter- 
mediate nodes. 

There are several ways to characterize the shortest 
path between two nodes in a graph. Note that the 
shortest path from i to j, i 7^ j, using at most M 
edges consists of the shortest path from i to k using 
at most M— 1 edges+VF(fc, j) for some k. 

This suggests that we can compute all-pair shortest 
path with an induction based on the number of edges 
in the optimal path. 



Let c?[i,i]"^ be the length of the shortest path from i 
to j using at most m edges. 

What is d[ij]^? 

d[ijf = 0\fi = j 

= OO if 2 ^ J 

What if we know d[i,j]'^~'^ for all i,j? 

dlijr = m\n{d[ijr-\m\n{d[i,kr-^-\-w[kJ])) 
= m\n(d[i, fc]^-^ + w[k,j]), l<k<i 

since w[k,k] = 

This gives us a recurrence, which we can evaluate in a 
bottom up fashion: 

for z = 1 to n 

for j = 1 to n 
d[ij]^ = oo 
for fc = 1 to n 

d[iJ]^=M\n{ 4z,fc]^, d[i,k]'^-^-\-d[kJ]) 

This is an 0{n^) algorithm just like matrix multiplica- 
tion, but it only goes from m to m -j- 1 edges. 



Since the shortest path between any two nodes must 
use at most n edges (unless we have negative cost 
cycles), we must repeat that procedure n times (m = 1 
to n) for an O(n^) algorithm. 

We can improve this to 0(rv^ log n) with the observation 
that any path using at most 2m edges is the function 
of paths using at most m edges each. This is just like 
computing a" = a"/^ x a"/^. So a logarithmic number 
of multiplications suffice for exponentiation. 

Although this is slick, observe that even 0(n"^logn) is 
slower than running Dijkstra's algorithm starting from 
each vertex! 



The Floyd-Warshall Algorithm 

An alternate recurrence yields a more efficient dynamic 
programming formulation. Number the vertices from 
1 to n. 

Let d[i,j]^ be the shortest path from i to j using only 
vertices from l,2,...,fc as possible intermediate ver- 
tices. 

What is d[j,j]^? With no intermediate vertices, any 
path consists of at most one edge, so c?[z, j]° = w[i,j]. 

In general, adding a new vertex fc + 1 helps iff a path 
goes through it, so 



dlhj]^ = w[ij] \fk = 

= m\n{d[i,j]^-^,d[i,k]^-^-\-d[k,j]^-^) \f k>l 

Although this looks similar to the previous recurrence, 
it isn't. The following algorithm implements it: 

d"" = w 

for k = 1 to n 

for z = 1 to n 

for j = 1 to n 

d[ij]^ = m\n{d[i,j]^-\d[i,k]^-^-\-d[k,j]^-^) 

This obviously runs in Q(n^) time, which asymptoti- 
cally is no better than a calls to Dijkstra's algorithm. 
However, the loops are so tight and it is so short and 
simple that it runs better in practice by a constant 
factor. 



Give an 0{n Ig k)-time algorithm which merges k sorted 
lists with a total of n elements into one sorted list, 
(hint: use a heap to speed up the elementary 0{kn)- 
time algorithm). 



The elementary algorithm compares the heads of each 
of the k sorted lists to find the minimum element, puts 
this in the sorted list and repeats. The total time is 
0{kn). 

Suppose instead that we build a heap on the head ele- 
ments of each of the k lists, with each element labeled 
as to which list it is from. The minimum element can 
be found and deleted in 0(lg fc) time. Further, we can 
insert the new head of this list in the heap in 0(lg fc) 
time. 

An alternate O(nlgfc) approach would be to merge the 
lists from as in mergesort, using a binary tree on k 
leaves (one for each list). 



Combinatorial Search 

We have seen how clever algorithms can reduce sorting 
from O(n^) to O(nlogn). However, the stakes are even 
higher for combinatorially explosive problems: 



The Traveling Salesman Problem 

Given a weighted graph, find the shortest cycle which 
visits each vertex once. 




Applications include minimizing plotter movement, printed- 
circuit board wiring, transportation problems, etc. 

There is no known polynomial time algorithm (ie. O(n^) 
for some fixed k) for this problem, so search-based al- 
gorithms are the only way to go if you need an optional 
solution. 



But I want to use a 
Supercomputer 

Moving to a faster computer can only buy you a rela- 
tively snnall innprovement: 

• Hardware clock rates on the fastest computers 
only improved by a factor of 6 from 1976 to 1989, 
from 12ns to 2ns. 

• Moving to a machine with 100 processors can only 
give you a factor of 100 speedup, even if your 
job can be perfectly parallelized (but of course it 
can't). 

• The fast Fourier algorithm (FFT) reduced compu- 
tation from O(n^) to O(nlgn). This is a speedup 
of 340 times on n = 4096 and revolutionized the 
field of image processing. 

• The fast multipole method for n-particle interac- 
tion reduced the computation from O(n^) to 0{n). 
This is a speedup of 4000 times on n = 4096. 



Can Eight Pieces Cover a 
Cliess Board? 

Consider the 8 main pieces in chess (king, queen, two 
rooks, two bishops, two knights). Can they be posi- 
tioned on a chessboard so every square is threatened? 

























N 


B 






R 








N 


























R 


















B 


















Q 






K 























Only 63 square are threatened in this configuration. 
Since 1849, no one had been able to find an arrange- 
nnent with bishops on different colors to cover all squares. 

Of course, this is not an important problem, but we will 
use it as an example of how to attack a combinatorial 
search problem. 



How many positions to test? 



Picking a square for each piece gives us the bound: 

64!/(64 - 8)! = 178, 462, 987, 637, 760 « 10^^ 

Anything much larger than 10^ is unreasonable to search 
on a modest computer in a modest amount of time. 

However, we can exploit symmetry to save work. With 
reflections along horizontal, vertical, and diagonal axis, 
the queen can go in only 10 non-equivallent positions. 

Even better, we can restrict the white bishop to 16 
spots and the queen to 16, while being certain that we 
get all distinct configurations. 



Q Q Q 




16x16x32x64x2080x2080 = 2,268,279,603,200 « 10 
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Backtracking 



Backtracking is a systematic way to go through all the 
possible configurations of a search space. 

In the general case, we assume our solution is a vector 
V = (ai, a2, ..., ttn) where each element ai is selected 
from a finite ordered set Si, 

We build from a partial solution of length k v = (ai, a2, ..., ak) 
and try to extend it by adding another element. After 
extending it, we will test whether what we have so far 
is still possible as a partial solution. 

If it is still a candidate solution, great. If not, we delete 
ak and try the next element from Sk'. 

Compute Si, the set of candidate first elements of v. 

k = l 

While fc > do 

While 5fc 7^ do (*advance*) 

ak = an element in Sk 

Sk -^ Sk — ak 

if {o'i,cL2, ■■■■,cLk) is solution, print! 

k = k-\-l 

compute Sk, the candidate fcth elements given v. 
k = k - 1 (*backtrack*) 



Recursive Backtracking 

Recursion can be used for elegant and easy implemen- 
tation of backtracking. 

Backtrack(a, k) 

if a is a solution, print(a) 

else { 

k = k-\-l 
compute Sk 
while 5jt 7^ do 

ak = an element in Sk 
Sk = Sk — ak 
Backtrack(a, k) 

} 

Backtracking can easily be used to iterate through all 
subsets or permutations of a set. 

Backtracking ensures correctness by enumerating all 
possibilities. 

For backtracking to be efficient, we must prune the 
search space. 



Constructing all Subsets 

How many subsets are there of an n-element set? 

To construct all 2" subsets, set up an array/vector 
of n cells, where the value of ai is either true or false, 
signifying whether the zth item is or is not in the subset. 

To use the notation of the general backtrack algorithm, 
Sk = {true, false), and -u is a solution whenever k>n. 

What order will this generate the subsets of {1,2,3}? 



(1)^(1,2)^(1,2,3)* 

(l,2,-)*^(l,-)^(l,-,3)* 
(l,-,-)*^(l,-)^(l)^ 
(-)^(-,2)^(-,2,3)*^ 

(-,2,-)*^(-,-)^(-,-,3)*^ 

(-,-,-)* ^(-,-)^(-)^() 



Constructing all Permutations 

How many permutations are there of an n-element set? 



To construct all n! permutations, set up an array/vector 
of n cells, where the value of ai is an integer from 1 
to n which has not appeared thus far in the vector, 
corresponding to the zth element of the permutation. 

To use the notation of the general backtrack algorithm, 
Sk = (1,. . . tu) — V, and V is a solution whenever k >n. 



(1) 


-^ 


(1,2) 


(1,3,2)* 


->■ 


(1,3) 


(2,1,3)* 


-^ 


(2,1) 


(2) 


-^ 


0^ 


(3,2) 


->■ 


(3,2, 



(1,2,3)* ^(1,2)^(1) 
(1)^0^(2)^(2,1) 
(2)^(2,3)^(2,3,1)* 
)^(3)^(3,1)(3,1,2)*^(3,1) 
1)*^ (3,2) ^(3)^0 



(1,3) 

(2,3) 
-^(3) 







The n-Queens Problem 

The first use of pruning to deal with the combinatorial 
explosion was by the king who rewarded the fellow who 
discovered chess! 

In the eight Queens, we prune whenever one queen 
threatens another. 



Covering the Chess Board 

In covering the chess board, we prune whenever we 
find there is a square which we cannot cover given the 
initial configuration! 

Specifically, each piece can threaten a certain maxi- 
mum number of squares (queen 27, king 8, rook 14, 
etc.) Whenever the number of unthreated squares ex- 
ceeds the sum of the maximum number of coverage 
remaining in unplaced squares, we can prune. 

As implemented by a graduate student project, this 
backtrack search eliminates 95% of the search space, 
when the pieces are ordered by decreasing mobility. 

With precomputing the list of possible moves, this pro- 
gram could search 1,000 positions per second. But this 
is too slow! 

10^^/10^ = 10^ seconds > 1000 days 

Although we might further speed the program by an 
order of magnitude, we need to prune more nodes! 

By using a more clever algorithm, we eventually were 
able to prove no solution existed, in less than one day's 
worth of computing. 

You too can fight the combinatorial explosion! 



The Backtracking Contest: 

Bandwidth 

The bandwidth problem takes as input a graph G, with 
n vertices and m edges (ie. pairs of vertices). The 
goal is to find a permutation of the vertices on the line 
which minimizes the maximum length of any edge. 




^"WWW^ 



The bandwidth problem has a variety of applications, 
including circuit layout, linear algebra, and optimizing 
memory usage in hypertext documents. 

The problem is NP-complete, meaning that it is ex- 
ceedingly unlikely that you will be able to find an algo- 
rithm with polynomial worst-case running time. It re- 
mains NP-complete even for restricted classes of trees. 

Since the goal of the problem is to find a permutation, 
a backtracking program which iterates through all the 
n! possible permutations and computes the length of 
the longest edge for each gives an easy 0{n\ • m) al- 
gorithm. But the goal of this assignment is to find as 
practically good an algorithm as possible. 



The Backtracking Contest: 

Set Cover 

The set cover problem takes as input a collection of 
subsets S = {Si, . . . , Sm} of the universal set U = {!,... ,n}. 
The goal is to find the smallest subset of the subsets 

T such that uj^i^i = U. 





Set cover arises when you try to efficiently acquire or 
represent items that have been packaged in a fixed 
set of lots. You want to obtain all the items, while 
buying as few lots as possible. Finding a cover is easy, 
because you can always buy one of each lot. However, 
by finding a small set cover you can do the same job 
for less money. 

Since the goal of the problem is to find a subset, a 
backtracking program which iterates through all the 
2"^ possible subsets and tests whether it represents a 
cover gives an easy 0(2"^ • nm) algorithm. But the 
goal of this assignment is to find as practically good 
an algorithm as possible. 



Rules of the Game 

1. Everyone must do this assignment separately. Just 
this once, you are not allowed to work with your 
partner. The idea is to think about the problem 
from scratch. 

2. If you do not completely understand what the prob- 
lem is, you don't have the slightest chance of pro- 
ducing a working program. Don't be afraid to ask 
for a clarification or explanation!!!!! 

3. There will be a variety of different data files of 
different sizes. Test on the smaller files first. Do 
not be afraid to create your own test files to help 
debug your program. 

4. The data files are available via the course WWW 
page. 

5. You will be graded on how fast and clever your 
program is, not on style. No credit will be given 
for incorrect programs. 

6. The programs are to run on the whatever com- 
puter you have access to, although it must be 
vanilla enough that I can run the program on some- 
thing I have access to. 

7. You are to turn in a listing of your program, along 
with a brief description of your algorithm and any 



interesting optimizations, sannple runs, and the 
time it takes on sample data files. Report the 
largest test file your program could handle in one 
minute or less of wall clock time. 

8. The top five self-reported times / largest sizes will 
be collected and tested by me to determine the 
winner. 



Producing Efficient Programs 

1. Don't optimize prematurely: Worrying about 
recursion vs. iteration is counter-productive until 
you have worked out the best way to prune the 
tree. That is where the money is. 

2. Ciioose your data structures for a reason: What 
operations will you be doing? Is case of inser- 
tion/deletion more crucial than fast retrieval? 

When in doubt, keep it simple, stupid (KISS). 

3. Let tiie profiler determine where to do final 
tuning: Your program is probably spending time 
where you don't expect. 



X is majority element of a set S if the number of times 
it occurs is > \S\/2. Give an 0{n) algorithm to test 
whether an unsorted array S of n elements has a ma- 
jority element. 



Sorting the list and checking the median element yields 
an 0(n log n) algorithm - correct, but too slow. 

Observe that if I delete two occurences of different 
elements from the set, I have not changed the majority 
element - since n is reduced by two while the count of 
the majority element is decreased by at most one. 

Thus we can scan the set from left to right, and keep 
count of how many times we see the first element be- 
fore we see an instance of a second element. We delete 
this pair and continue. If we are left with one element 
at the end, this is the only candidate for the majority 
element. 

We must verify that this candidate is in fact a majority 
element, but that can be tested by counting in a second 
0(n) sweep over the data. 



Combinatorial Optimization 

In most of the algorithmic problems we study, we seek 
to find the best answer as quickly as possible. 

Traditional algorithmic methods fail when (1) the prob- 
lem is provably hard, or (2) the problem is not clean 
enough to lead to a nice formulation. 

In most problems, there is a natural way to (1) con- 
struct possible solutions and (2) measure how good a 
given solution is, but it is not clear how to find the 
best solution short of searching all configurations. 

Heuristic methods like simulated annealing give us a 
general approach to search for good solutions. 




Simulated Annealing 

The inspiration for simulated annealing comes from 
cooling molten materials down to solids. To end up 
with the globally lowest energy state you must cool 
slowly so things cool evenly. 

In thermodynamic theory, the likelihood of a particular 
particle jumping to a higher energy state is given by: 

where Ei, Ej denote the before/after energy states, fc^ 
is the Boltzman constant, and T is the temperature. 

Since minimizing energy is a combinatorial optimiza- 
tion problem, we can mimic the physics for computing. 

Simulated-AnnealingO 

Create initial solution 5 
Initialize temperature t 
repeat 

for z = 1 to iteration-length do 

Generate a random transition from S to Si 

If (C(5) < C(5i)) then S = Si 

else if (e^C{S)-C{S,))/{k.t) > random[0, 1)) 

then S = Si 
Reduce temperature t 
until (no change in C{S)) 
Return S 



Components of Simulated 

Annealing 

There are three components to any simulated anneal- 
ing algorithm for combinatorial search: 

• Concise problem representation - The problem rep- 
resentation includes both a representation of the 
solution space and an appropriate and easily com- 
putable cost function C{s) measuring the quality 
of a given solution. 

• Transition mechanism between solutions- To move 
from one state to the next, we need a collection of 
simple transition mechanisms that slightly modify 
the current solution. Typical transition mecha- 
nisms include swapping the position of a pair of 
items or inserting/deleting a single item. 

• Cooling schedule— These parameters govern how 
likely we are to accept a bad transition, which 
should decrease as a function of time. At the 
beginning of the search, we are eager to use ran- 
domness to explore the search space widely, so 
the probability of accepting a negative transition 
is high. The cooling schedule can be regulated by 
the following parameters: 

— Initial system temperature - Typically ti = 1. 

— Temperature decrement function - Typically 
tj^. = a-tfc_i, where 0.8 < a < 0.99. This implies 



an exponential decay in the temperature, as 
opposed to a linear decay. 

— Number of iterations between temperature change 
- Typically, 100 to 1,000 iterations might be 
permitted before lowering the temperature. 

— Acceptance criteria - A typical criterion is to 
accept any transition from si to s^+i when C(si+i) > 
C{si) and to accept a negative transition when- 
ever 

(C('i+l)-C('i) 

e "'i > r, 

where r is a random number < r < 1. The 
constant c normalizes this cost function, so 
that almost all transitions are accepted at the 
starting temperature. 

— Stop criteria - Typically, when the value of the 
current solution has not changed or improved 
within the last iteration or so, the search is 
terminated and the current solution reported. 

We provide several examples to demonstrate how these 
components can lead to elegant simulated annealing 
algorithms for real combinatorial search problems. 



Traveling Salesman Problem 

Solution space- set of 3\\ (n— 1)! circular permutations. 

Cost function - sum up the costs of the edges defined 
by S. 

Transition mechanism - The most obvious transition 
mechanism would be to swap the current tour positions 
of a random pair of vertices Si and Sj. This changes up 
to eight edges on the tour, deleting the edges currently 
adjacent to both Si and Sj, and adding their replace- 
ments. Better would be to swap two edges on the tour 
with two others that replace it 





Since only four edges change in the tour, the transi- 
tions can be performed and evaluated faster. Faster 
transitions mean that we can evaluate more positions 
in the given amount of time. 

In practice, problem-specific heuristics for TSP outper- 
form simulated annealing, but the simulated annealing 
solution works admirably, considering it uses very little 
knowledge about the problem. 



Maximum Cut 

Given a weighted graph, partition the vertices to max- 
imize the weight of the edges cut. 




This NP-complete problem arises in circuit design ap- 
plications. 

Solution space - set of all 2"~^ vertex partitions, rep- 
resented as a bit string. 

Cost function - the weight of the edges which are cut. 

Transition mechanism - move one vertex across the 
partition. 

A/ = (weight of old neighbors - weight of new neighbors) 



This kind of procedure seems to be the right way to 
do maxcut in practice. 



Independent Set 



An independent set of a graph G is a subset of vertices 
S such that there is no edge with both endpoints in 
5. The maximum independent set of a graph is the 
largest such empty induced subgraph. 
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Solution space - set of all 2" vertex subsets, repre- 
sented as a bit string. 

Cost function - C{S) = |5| - A • ms/T, where A is a 
constant, T is the temperature, and ms is the number 
of edges in the subgraph induced by S. 

The dependence of C{S) on T ensures that the search 
will drive the edges out faster as the system cools. 

Transition mechanism - move one vertex in/out of the 
subset. 

More flexibility in the search space and quicker A/ 
computations result from allowing non-empty graphs 
at the early stages of the cooling. 



Chromatic Number 

What is the smallest number of colors needed to color 
vertices such that no edge links two vertices of the 
same color? 




The solution is complicated by the fact that many ver- 
tices have to shift (potentially) to reduce the chromatic 
number by one. 

To insure that the proposed colorings are biased in 
favor of low cardinality subsets (i.e. 28 red, 1 blue, and 
1 green is better than 10 red, 10, blue, and 10 green), 
we will make certain colors more expensive than others. 

By weighting the colors Wj^i < 2wj — wi (ex: 100, 
99, 97, 93, 85, 69, 37) we get faster convergence, 
although certain configurations might be cheaper than 
ones achieving the chromatic number! This can be 
enforced with a more complicated scheme. 

By Brooks' Theorem, every graph can be colored with 
A + 1 colors. In fact A colors suffice unless G is com- 
plete or an odd-cycle. 



Solution space— all possible partitions of vertices into 
A + 1 color classes, where A is the maximum vertex 
degree. 

Cost function - J2it^^ '^i(.\^i\ ~ M^i\)' where A > 1 is 
the penalty constant. 

Transition mectianism - randomly move one vertex to 
another subset. 



Circuit Board Placement 

In designing printed circuit boards, we are faced with 
the problem of positioning modules (typically integrated 
circuits) on the board. 

Desired criteria in a layout include (1) minimizing the 
area or aspect ratio of the board, so that it properly fits 
within the allotted space, and (2) minimizing the total 
or longest wire length in connecting the components. 

Circuit board placement is an example of the kind of 
messy, multicriterion optimization problems for which 
simulated annealing is ideally suited. 

We are given a collection of a rectangular modules 
ri,...,rn, each with associated dimensions hi x k. For 
each pair of modules ri,rj, we are given the number 
of wires Wij that must connect the two modules. We 
seek a placement of the rectangles that minimizes area 
and wire-length, subject to the constraint that no two 
rectangles overlap each other. 

Solution space - The positions of each rectangle. To 
provide a discrete representation, the rectangles can be 
restricted to lie on vertices of an integer grid. 

Cost function - A natural cost function would be 

n n 
^\^) — ■^area \ ^height ' ^ width ) ~l~ / ^ / ^ \^wiTe''^ij'^ij\^overlap\Ti^ ^'^jj) 

where \area, ^wire, and Xoveriap ^^^ constants governing 
the impact of these components on the cost function. 

Transition mechanism - moving one rectangle to a dif- 
ferent location, or swapping the position of two rect- 
angles. 



Lessons from the Backtracking 

contest 

• As predicted, the speed difference between the 
fastest programs and average program dwarfed the 
difference between a supercomputer and a micro- 
computer. Algorithms have a bigger impact on 
performance than hardware! 

• Different algorithms perform differently on differ- 
ent data. Thus even hard problems may be tractable 
on the kind of data you might be interested in. 

• None of the programs could efficiently handle all 
instances for n « 30. We will find out why after 
the midterm, when we discuss NP-completeness. 

• Many of the fastest programs were very short and 
simple (KISS). My bet is that many of the en- 
hancements students built into them actually showed 
them down! This is where profiling can come in 
handy. 

• The fast programs were often recursive. 



Winning Optimizations 

Finding a good initial solution via randomization 
or heuristic innprovement helped by establishing a 
good upper bound, to constrict search. 

Using half the largest vertex degree as a lower 
bound sinnilarly constricted search. 

Pruning a partial permutation the instant an edge 
was > the target made the difference in going from 
(say) 8 to 18. 

Positioning the partial permutation vertices sepa- 
rated by b instead of 1 meant significantly earlier 
cutoffs, since any edge does the job. 

Mirror symmetry can only save a factor of 2, but 
perhaps more could follow from partitioning the 
vertices into equivalence classes by the same neigh- 
borhood. 



Among n people, a celebrity is defined as someone wlio 
is known by everyone but does not know anyone. We 
seek to identify the celebrity (if one is present) by ask- 
ing questions of ttie form "Hey x, do you know person 
y?" . SInow liow to find the celebrity using 0{n) ques- 
tions. 



Note that there are n^ possible questions to ask, so we 
cannot ask them all. 

What if we ask 1 if she knows 2, and 2 if she knows 
1? If both know each other neither can be a celebrity. 
If neither know each other, neither can be a celebrity. 
If one of them knows the other, the former cannot be 
a celebrity. 

Thus in two questions we can eliminate at least one 
person from celebrity status. Thus in 2(n — 1) ques- 
tions, we have only one possible celebrity. It is now pos- 
sible to check whether the survivor is really a celebrity 
using n—1 additional queries, by checking whether ev- 
eryone else knows them. 



An Eulerian cycle in a graph visits each edge exactly 
once. A graph contains an Eulerian cycle iff it is con- 
nected and the degree of each vertex is even. Give 
an 0{\E\) algorithm to find an Eulerian cycle if one 
exists. 



Observe that an cycle of edges defines a graph where 
each vertex is of degree 2. Thus deleting a cycle from 
an Eulerian graph leaves each vertex with even degree, 
although the graph may not be connected. 

We can use depth-first search to decompose the edges 
of a graph into cycles. If the graph was connected, 
these cycles must link together. Splicing them to- 
gether gives an Eulerian cycle. For example, the cycle 
(1,2,3,1) and (4,5,6,1,4) can be spliced together as 
(4,5,6,1,2,3,1,4). 

Although Eulerian cycle has an efficient algorithm, the 
Hamiltonian cycle problem (visit each vertex exactly 
once) is NP-complete. 



The Theory of 
NP-Completeness 

Several times this semester we have encountered prob- 
lems for which we couldn't find efficient algorithms, 
such as the traveling salesman problem. We also couldn't 
prove an exponential time lower bound for the problem. 



By the early 1970s, literally hundreds of problems were 
stuck in this limbo. The theory of NP-Compleness, 
developed by Stephen Cook and Richard Karp, provided 
the tools to show that all of these problems were really 
the same problem. 
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The Main Idea 

Suppose I gave you the following algorithm to solve the 
bandersnatch problem: 

Bandersnatch(G) 

Convert G to an instance of the Bo-billy problem Y. 
Call the subroutine Bo-billy on Y to solve this instance. 
Return the answer of Bo-billy(y) as the answer to G. 

Such a translation from instances of one type of prob- 
lem to instances of another type such that answers are 
preserved is called a reduction. 

Now suppose my reduction translates G to Y in 0{P{n)): 

1. If my Bo-billy subroutine ran in 0{P'{n)) I can 
solve the Bandersnatch problem in 0(P(n)+P'(n)) 

2. If I know that Q(P'(n)) is a lower-bound to com- 
pute Bandersnatch, then fi(P'(n) -P(n)) must be 
a lower-bound to compute Bo-billy. 

The second argument is the idea we use to prove prob- 
lems hard! 



Convex Hull and Sorting 

A nice example of a reduction goes fronn sorting nunn- 
bers to the convex hull problem: 




We must translate each number to a point, 
map X to (a?, ic^). 



We can 




Why? That means each integer is mapped to a point 
on the parabola y = x^. 



Since this parabola is convex, every point is on the 
convex hull. Further since neighboring points on the 
convex hull have neighboring x values, the convex hull 
returns the points sorted by a;-coordinate, ie. the orig- 
inal nunnbers. 

Sort(5) 

For each i £ S, create point (2,2^). 

Call subroutine convex-hull on this point set. 

From the leftmost point in the hull, 

read off the points from left to right. 

Creating and reading off the points takes 0{n) time. 

What does this mean? Recall the sorting lower bound 
of Q(nlgn). If we could do convex hull in better than 
nign, we could sort faster than Q(nlgn) - which vio- 
lates our lower bound. 

Thus convex hull must take Q(nlgn) as well!!! 

Observe that any O(nlgn) convex hull algorithm also 
gives us a complicated but correct O(nlgn) sorting 
algorithm as well. 



What is a problem? 

A problem is a general question, with parameters for 
the input and conditions on what is a satisfactory an- 
swer or solution. 

An instance is a problem with the input parameters 
specified. 

Example: The Traveling Salesman 

Problem: Given a weighted graph G, what tour {1^1,^2, •••,Vn} 
minimizes Ya=i ^N^'^i+i] + d[vn,vi]. 

Instance: d[vi,d2] = 10, ^[1^1,^3] = 5, ^['^1,^4] = 9, 
d[v2,d3] = 6, d[v2,d4] = 9, d[v3,d4] = 3 




Solution: {'ui,'U2,'y3,^4} cost= 27 

A problem with answers restricted to yes and no is 
called a decision problem. Most interesting optimiza- 
tion problems can be phrased as decision problems 
which capture the essence of the computation. 

Example: The Traveling Salesman Decision Problem. 



Given a weighted graph G and integer k, does there 
exist a traveling salesnnan tour with cost < A;? 

Using binary search and the decision version of the 
problenn we can find the optinnal TSP solution. 

For convenience, from now on we will talk only about 
decision problenns. 

Note that there are many possible ways to encode the 
input graph: adjacency matrices, edge lists, etc. All 
reasonable encodings will be within polynomial size of 
each other. 

The fact that we can ignore minor differences in en- 
coding is important. We are concerned with the dif- 
ference between algorithms which are polynomial and 
exponential in the size of the input. 



Satisfiability 

Consider the following logic problem: 

Instance: A set V of variables and a set of clauses C 
over V. 

Question: Does there exist a satisfying truth assign- 
nnent for C? 

Exannple 1: V = 'ui,'U2 and C = {{vi,v2}j{vi,v2}} 

A clause is satisfied when at least one literal in it is 
TRUE. C is satisfied when vi = V2 =TRUE. 

Exannple 2: y = ^;i,'U2, 

Although you try, and you try, and you try and you try, 
you can get no satisfaction. 

There is no satisfying assigment since vi must be FALSE 
(third clause), so V2 must be FALSE (second clause), 
but then the first clause is unsatisfiable! 

For various reasons, it is known that satisfiability is a 
hard problem. Every top-notch algorithm expert in the 
world (and countless other, lesser lights) have tried 
to come up with a fast algorithm to test whether a 
given set of clauses is satisfiable, but all have failed. 
Further, many strange and impossible-to-believe things 
have been shown to be true if someone in fact did find 
a fast satisfiability algorithm. 

Clearly, Satisfiability is in NP, since we can guess an 
assignment of TRUE, FALSE to the literals and check 
it in polynomial time. 



P versus NP 

The precise distinction between whether a problenn is 
in P or NP is sonnewhat technical, requiring fornnal lan- 
guage theory and Turing nnachines to state correctly. 



However, intuitively a problenn is in P, (ie. polynonnial) 
if it can be solved in time polynomial in the size of the 
input. 

A problem is in NP if, given the answer, it is possible to 
verify that the answer is correct within time polynomial 
in the size of the input. 

Example P - Is there a path from s to t in G of length 
less than k. 

Example NP - Is there a TSP tour in G of length less 
than k. Given the tour, it is easy to add up the costs 
and convince me it is correct. 

Example not NP - How many TSP tours are there 
in G of length less than k. Since there can be an 
exponential number of them, we cannot count them 
all in polynomial time. 

Don't let this issue confuse you - the important idea 
here is of reductions as a way of proving hardness. 



3-Satisfiability 



Instance: A collection of clause C where each clause 
contains exactly 3 literals, boolean variable v. 

Question: Is there a truth assignnnent to v so that each 
clause is satisfied? 

Note that this is a more restricted problenn than SAT. If 
3-SAT is NP-connplete, it innplies SAT is NP-connplete 
but not visa-versa, perhaps long clauses are what nnakes 
SAT difficult?! 

After all, 1-Sat is trivial! 

Theorem: 3-SAT is NP-Connplete 

Proof: 3-SAT is NP - given an assignnnent, just check 
that each clause is covered. To prove it is connplete, 
a reduction from Sat oc 3 - Sat must be provided. We 
will transform each clause independantly based on its 
length. 

Suppose the clause d contains k literals. 

• If A; = 1, meaning d = {zi}, create two new vari- 
ables 'ui,'U2 and four new 3-literal clauses: 

{vi,V2,Zi}, {vi,V2,Zl}, {vi,V2,Zl}, {^1,^2, ^l}- 

Note that the only way all four of these can be 
satisfied is if z is TRUE. 

• If fc = 2, meaning {zi,z2}, create one new variable 
VI and two new clauses: {'^1,^1,^2}, {^1,^1,^2} 



If k = 3, meaning {zi,z2,z3}, copy into the 3-SAT 
instance as it is. 

If fc > 3, nneaning {zi,z2,...,Zn}, create n — 3 new 
variables and n— 2 new clauses in a chain: {vi,zi,vi}, 



If none of the original variables in a clause are TRUE, 
there is no way to satisfy all of thenn using the addi- 
tional variable: 

{F,F,T),{F,F,T),...,{F,F,F) 

But if any literal is TRUE, we have n — 3 free variables 
and n — 3 rennaining 3-clauses, so we can satisfy each of 
them. (F,F,T),(F,F,r),...,(F,T,F),...,(T,F,F),(T,F,F) 

Since any SAT solution will also satisfy the 3-SAT in- 
stance and any 3-SAT solution sets variables giving a 
SAT solution - the problems are equivallent. If there 
were n clauses and m total literals in the SAT instance, 
this transform takes 0(m) time, so SAT and 3-SAT. 

Note that a slight modification to this construction 
would prove 4-SAT, or 5-SAT,... also NP-complete. 
However, it breaks down when we try to use it for 2- 
SAT, since there is no way to stuff anything into the 
chain of clauses. It turns out that resolution gives a 
polynomial time algorithm for 2-SAT. 

Having at least 3-literals per clause is what makes the 
problem difficult. Now that we have shown 3-SAT 



is NP-complete, we may use it for furtiier reductions. 
Since the set of 3-SAT instances is snnaller and more 
regular than the SAT instances, it will be easier to use 
3-SAT for future reductions. Remember the direction 
to reduction! 

Sat oc 3 — Sat oc X 



A Perpetual Point of Confusion 

Note carefully the direction of the reduction. 

We must transform every instance of a known NP- 
complete problem to an instance of the problem we 
are interested in. If we do the reduction the other way, 
all we get is a slow way to solve x, by using a subroutine 
which probably will take exponential time. 

This always is confusing at first - it seems bass-ackwards. 
Make sure you understand the direction of reduction 
now - and think back to this when you get confused. 



Integer Programming 

Instance: A set v of integer variables, a set of inequal- 
ities over these variables, a function f{v) to nnaxinnize, 
and integer B. 

Question: Does there exist an assignnnent of integers 
to V such that all inequalities are true and f{v) > B? 

Example: 

"Ul > 1, 'U2 > 
■Ul + 'U2 < 3 

f(v) : 2t;2, B = 3 
A solution to this is vi = l, 'U2 = 2. 

Example: 

"Ul > 1, 'U2 > 
"^1 + 1^2 < 3 

fiv) : 2i;2, ^ = 5 

Since the maximum value of f{v) given the constraints 
is 2 X 2 = 4, there is no solution. 

Theorem: Integer Programming is NP-Hard 

Proof: By reduction from Satisfiability 

Any set instance has boolean variables and clauses. 
Our Integer programming problem will have twice as 
many variables as the SAT instance, one for each vari- 
able and its compliment, as well as the following in- 
equalities: 



For each variable vi in the set problem, we will add the 
following constraints: 

• 1 < Vi < an6 1 <Vi < 

Both IP variables are restricted to values of or 1, 
which nnakes them equivalent to boolean variables 
restricted to true/false. 

• l<Vi-\-Vi<l 

Exactly one of the IP variables associated with a 
given sat variable is 1. This means that exactly 
one of Vi and Vi are true! 

• for each clause d = {vi,v2,v3.. .Vn} in the sat 
instance, construct a constraint: 

VI -\- V2 -\- V3 -\- ' " Vn > 1 

Thus at least one IP variable must be one in each 
clause! Thus satisfying the constraint is equivalent 
to satisfying the clause! 

Our maximization function and bound are relatively 
unimportant: f{v) = ViB = 0. 

Clearly this reduction can be done in polynomial time. 



We must show: 

1. Any SAT solution gives a solution to the IP prob- 
lenn. 

In any SAT solution, a TRUE literal corresponds to 
a 1 in the IP, since if the expression is SATISFIED, 
at least one literal per clause in TRUE, so the sunn 
in the inequality is > 1. 

2. Any IP solution gives a SAT solution. 

Given a solution to this IP instance, all variables 
will be or 1. Set the literals correspondly to 1 
variable TRUE and the to FALSE. No boolean 
variable and its complement will both be true, so 
it is a legal assignment with also must satisfy the 
clauses. 

Neat, sweet, and NP-complete! 



Things to Notice 

1. The reduction preserved the structure of the prob- 
lem. Note that the reduction did not solve the 
problem - it just put it in a different format. 

2. The possible IP instances which result are a small 
subset of the possible IP instances, but since some 
of them are hard, the problem in general must be 
hard. 

3. The transformation captures the essence of why IP 
is hard - it has nothing to do with big coefficients 
or big ranges on variables; for restricting to 0/1 
is enough. A careful study of what properties we 
do need for our reduction tells us a lot about the 
problem. 

4. It is not obvious that IP < NP, since the numbers 
assigned to the variables may be too large to write 
in polynomial time - don't be too hasty! 



Give a polynomial-time algorithm to satisfy Boolean 
formulas in disjunctive normal form. 



Satisfying one clause in DFS satisfied the whole for- 
nnula. One clause can always be satisfied iff it does 
not contain both a variable and its connplennent. 

Why not use this reduction to give a polynonnial-tinne 
algorithm for 3-SAT? The DNF formula can become 
exponentially large and hence the reduction cannot be 
done in polynomial time. 



Given an integer m x n matrix A, and in integer m- 
vector h, the 0-1 integer programming problem asks 
whether there is an integer n-vector x with elements in 
the set (0, 1) such that Ax < b. Prove that 0-1 integer 
programming is NP-hard (hint: reduce from 3-SAT). 



This is really the exact same problem as the previous 
integer programming problem, slightly concealed by: 

• The linear algebra notation - each row is one con- 
straint. 

• All inequalities are < - multiply both sides by -1 
to reverse the constraint from > to < if necessary. 



Vertex Cover 

Instance: A graph G = {V,E), and integer k <V 

Question: Is there a subset of at most k vertices such 
that every e £ E has at least one vertex in the subset? 




Here, four of the eight vertices are enough to cover. It 
is trivial to find a vertex cover of a graph -just take all 
the vertices. The tricky part is to cover with as snnall 
a set as possible. 

Theorem: Vertex cover is NP-connplete. 

Proof: VC in in NP - guess a subset of vertices, count 
them, and show that each edge is covered. 

To prove completeness, we show 3-SAT and VC. From 
a 3-SAT instance with n variables and C clauses, we 
construct a graph with 2N -\- 3C vertices. 



For each variable, we create two vertices connected by 
an edge: 



-• • — • •- 



To cover each of these edges, at least n vertices must 
be in the cover, one for each pair. For each clause, we 
create three new vertices, one for each literal in each 
clause. Connect these in a triangle. 

At least two vertices per triangle must be in the cover 
to take care of edges in the triangle, for a total of at 
least 2C vertices. 



Finally, we will connect each literal in the flat structure 
to the corresponding vertices in the triangles which 
share the same literal. 




Claim: This graph will have a vertex cover of size iV + 
2C if and only if the expression is satisfiable. 

By the earlier analysis, any cover must have at least 
N-\-2C vertices. To show that our reduction is correct, 
we must show that: 

1. Every satisfying truth assignment gives a cover. 

Select the N vertices cooresponding to the TRUE 
literals to be in the cover. Since it is a satisfying 
truth assignment, at least one of the three cross 
edges associated with each clause must already be 
covered - pick the other two vertices to complete 
the cover. 

2. Every vertex cover gives a satisfying truth assign- 
ment. 

Every vertex cover must contain n first stage ver- 
tices and 2C second stage vertices. Let the first 
stage vertices define the truth assignment. 

To give the cover, at least one cross-edge must 
be covered, so the truth assignment satisfies. 

For a cover to have N-\-2C vertices, all the cross edges 
must be incident on a selected vertex. 

Let the N selected vertices from the first stage coore- 
spond to TRUE literals. If there is a satisfying truth 
assignment, that means at least one of the three cross 
edges from each triangle is incident on a TRUE vertex. 



By adding the other two vertices to the cover, we cover 
all edges associated with the clause. 

Every SAT defines a cover arid Every Cover Truth val- 
ues for the SAT! 



Example: Vi = V2 = True, V3 = V4 = False. 
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starting from the Right 

Problem 

As you can see, the reductions can be very clever 
and very complicated. While theoretically any NP- 
connplete problem can be reduced to any other one, 
choosing the correct one makes finding a reduction 
much easier. 

3 - 5at oc VC 

As you can see, the reductions can be very clever 
and complicated. While theoretically any NP-complete 
problem will do, choosing the correct one can make it 
much easier. 



Maximum Clique 

Instance: A graph G = {V,E) and integer j < v. 

Question: Does the graph contain a clique of j vertices, 
ie. is there a subset of v of size j such that every pair 
of vertices in the subset defines an edge of G? 

Example: this graph contains a clique of size 5. 




When talking about graph problems, it is nnost natural 
to work fronn a graph problem - the only A/P-complete 
one we have is vertex cover! 

Theorem: Clique is A/P-complete 

Proof: If you take a graph and find its vertex cover, the 
remaining vertices form an independent set, meaning 
there are no edges between any two vertices in the 
independent set, for if there were such an edge the 
rest of the vertices could not be a vertex cover. 




I I vertex in cover 

W vertex in independant 
set 



Clearly the smallest vertex cover gives the biggest in- 
dependent set, and so the problems are equivallent - 
Delete the subset of vertices in one from the total set 
of vertices to get the order! 

Thus finding the maximum independent set must be 
NP-complete! 



In an independent set, there are no edges between two 
vertices. In a clique, there are always between two 
vertices. Thus if we connplement a graph (have an 
edge iff there was no edge in the original graph), a 
clique beconnes an independent set and an independent 
set becomes a Clique! 




Max Clique = 5 
Max IS = 2 




Max Clique = 2 
Max IS = 5 



Thus finding the largest clique is NP-complete: 



If VC is a vertex cover in G, then V — VC is a clique 
in G'. If C is a clique \n G, V — C \s a vertex cover in 
G'. 



Integer Partition (Subset Sum) 



Instance: A set of integers S and a target integer t. 

Problem: Is there a subset of S which adds up exactly 
to t? 

Example: S = {1,4, 16, 64, 256, 1040, 1041, 1093, 1284, 1344} 
and T = 3754 

Answer: 1 + 16 + 64 + 256 + 1040 + 1093 + 1284 = T 

Observe that integer partition is a number problem, as 
opposed to the graph and logic problems we have seen 
to date. 

Theorem: Integer Partition is A/P-complete. 

Proof: First, we note that integer partition is in NP. 
Guess a subset of the input number and simply add 
them up. 

To prove completeness, we show that vertex cover oc 
integer partition. We use a data structure called an 
incidence matrix to represent the graph G. 
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How many I's are there in each column? Exactly two. 



How many I's in a particular row? Depends on the 
vertex degree. 

The reduction from vertex cover will create n-\-m num- 
bers from G. 

The numbers from the vertices will be a base-4 real- 
ization of rows from the incidence matrix, plus a high 
order digit: 

ie. V2 = 10100 becomes 4^ + (4^ + 4^). 
The numbers from the edges will be yi = 4^ . 
The target integer will be 

t = kx 41^1 + ^2x4^' 

3=0 

Why? Each column (digit) represents an edge. We 
want a subset of vertices which covers each edge. We 
can only use k x vertex/numbers, because of the high 
order digit of the target. 

xo = 100101 = 1041 X2 = 111000 = 1344 yi = 
000010 = 4 

We might get only one instance of each edge in a cover 
- but we are free to take extra edge/numbers to grab 
an extra 1 per column. 



VC in G ^ Integer Partition in S 

Given k vertices covering G, pick the k cooresponding 
vertex/numbers. Each edge in G is incident on one 
or two cover vertices. If it is one, includes the coore- 
sponding edge/number to give two per colunnn. 



Integer Partition in 5 ^ VC in G 

Any solution to S must contain exactly k vertex/numbers. 
Why? It cannot be more because the target in that 
digit is k and it cannot be less because, with at most 3 
I's per edge/digit-column, no sum of these can carry 
over into the next column. (This is why base-4 number 
were chosen). 

This subset of k vertex/numbers must contain at least 
one edge-list per column, since if not there is no way 
to account for the two in each column of the target 
integer, given that we can pick up at most one edge- 
list using the edge number. (Again, the prevention of 
carrys across digits prevents any other possibilites). 

Neat, sweet, and A/P-complete! 

Notice that this reduction could not be performed in 
polynomial time if the number were written in unary 
5 = 11111. Big numbers is what makes integer parti- 
tion hard! 



Prove that subgraph isomorphism is NP-compiete. 



1. Guessing a subgraph of G and proving it is isomor- 
phisnn to h takes O^ri^) time, so it is in NP. 

2. Clique and subgraph isonnorphisnn. We nnust trans- 
form all instances of clique into some instances of 
subgraph isomorphism. Clique is a special case of 
subgraph isomorphism! 

Thus the following reduction suffices. Let G = G' 
and H = Kk, the complete subgraph on k nodes. 



other iVP-complete Problems 

• Partition - can you partition n integers into two 
subsets so that the sunns of the subset are equal? 

• Bin Packing - how nnany bins of a given size do 
you need to hold n itenns of variable size? 

• Chromatic Number - how many colors do you need 
to color a graph? 

• N X N checkers - does black have a forced win 
from a given position? 

• Scheduling, Code Optimization, Permanent Eval- 
uation, Quadratic Programming, etc. 

Open: Graph Isomorphism, Composite Number, Mini- 
mum Length Triangulation. 



Polynomial or Exponential? 

Just changing a problenn a little can nnake the difference 
between it being in P or iVP-connplete: 





p 


iVP-connplete 




Shortest Path 


Longest Path 




Eulerian Circuit 


Hamiltonian Circuit 




Edge Cover 


Vertex Cover 



The first thing you should do when you suspect a prob- 
lem might be NP-complete is look in Carey and John- 
son, Computers and Intractability. It contains a list of 
several hundred problems known to be iVP-complete. 
Either what you are looking for will be there or you 
might find a closely related problem to use in a reduc- 
tion. 



Techniques for Proving 
ATP-compieteness 

Restriction - Show that a special case of the prob- 
lem you are interested in is iVP-connplete. For 
exannple, the problenn of finding a path of length 
k is really Hamiltonian Path. 



Local Replacement - Make local changes to the 
structure. An example is the reduction SAT ex: 
3 — SAT. Another example is showing isomorphism 
is no easier for bipartite graphs: 

- !> 




For any graph, replacing an edge with makes it 
bipartite. 



3. Component Design - These are the ugly, elaborate 
constructions 



The Art of Proving Hardness 

Proving that problems are hard is an skill. Once you 
get the hang of it, it is surprisingly straightforward and 
pleasurable to do. Indeed, the dirty little secret of NP- 
connpleteness proofs is that they are usually easier to 
recreate than explain, in the same way that it is usually 
easier to rewrite old code than the try to understand 
it. 

I offer the following advice to those needing to prove 
the hardness of a given problem: 

• Make your source problem as simple (i. e. restricted) 
as possible. 

Never use the general traveling salesman problem 
(TSP) as a target problem. Instead, use TSP 
on instances restricted to the triangle inequality. 
Better, use Hamiltonian cycle, i.e. where all the 
weights are 1 or oo. Even better, use Hamiltonian 
path instead of cycle. Best of all, use Hamilto- 
nian path on directed, planar graphs where each 
vertex has total degree 3. All of these problems 
are equally hard, and the more you can restrict 
the problem you are reducing, the less work your 
reduction has to do. 

• Make your target problem as hard as possible. 

Don't be afraid to add extra constraints or free- 
doms in order to make your problem more general 
(at least temporarily). 



Select the right source problem for the right rea- 
son. 

Selecting the right source problem makes a big 
difference is how difficult it is to prove a problem 
hard. This is the first and easiest place to go 
wrong. 

I usually consider four and only four problems as 
candidates for my hard source problem. Limiting 
them to four means that I know a lot about these 
problems - which variants of these problems are 
hard and which are soft. My favorites are: 

— 3-Sat - that old reliable. . . When none of the 
three problems below seem appropriate, I go 
back to the source. 

— Integer partition - the one and only choice for 
problems whose hardness seems to require us- 
ing large numbers. 

— Vertex cover - for any graph problems whose 
hardness depends upon selection. Chromatic 
number, clique, and independent set all involve 
trying to select the correct subset of vertices 
or edges. 

— Hamiltonian path - for any graph problems 
whose hardness depends upon ordering. If you 
are trying to route or schedule something, this 
is likely your lever. 



Amplify the penalties for making the undesired 
transition. 

You are trying to translate one problenn into an- 
other, while making them stay the same as much 
as possible. The easiest way to do this is to be 
bold with your penalties, to punish anyone trying 
to deviate from your proposed solution. "If you 
pick this, then you have to pick up this huge set 
which dooms you to lose." The sharper the con- 
sequences for doing what is undesired, the easier 
it is to prove if and only if. 

Think strategically at a high level, then build gad- 
gets to enforce tactics. 

You should be asking these kinds of questions. 
"How can I force that either A or B but not both 
are chosen?" "How can I force that A is taken 
before B?" "How can I clean up the things I did 
not select?" 

Alternate between looking for an algorithm or a 
reduction if you get stuck. 

Sometimes the reason you cannot prove hardness 
is that there is an efficient algorithm to solve your 
problem! When you can't prove hardness, it likely 
pays to change your thinking at least for a little 
while to keep you honest. 



Now watch me try it! 

To demonstrate how one goes about proving a problem 
hard, I accept the challenge of showing how a proof can 
be built on the fly. 

I need a volunteer to pick a random problem from the 
400+ hard problems in the back of Garey and John- 
son. 



Hamiltonian Cycle 

Instance: A graph G 

Question: Does the graph contains a HC, i.e. an or- 
dered of the vertices {'ui,'U2, ••-,'^71}? 

This probienn is intinnateiy relates to the Traveling Sales- 
man. 

Question: Is there an ordering of the vertices of a 
weighted graph such that w{vi,Vn) -\-^w{vi,ViJ^i) < k? 

Clearly, HC oc TSP. Assign each edge in G weight 
1, any edge not in G weight 2. This new graph has 
a Traveling Salesnnan tour of cost n iff the graph is 
Hamiltonian. Thus TSP is iVP-complete if we can 
show HC is iVP-complete. 

Theorem: Hamiltonian Circuit is iVP-complete 

Proof: Clearly HC is in TVP-guess a permutation and 
check it out. To show it is complete, we use vertex 
cover. A vertex cover instance consists of a graph 
and a constant k, the minimum size of an acceptable 
cover. We must construct another graph. Each edge 
in the initial graph will be represented by the following 
component: 



u 




vl 




All further connections to this gadget will be through 
vertices vi, vq, ui and uq. The key observation about 
this gadget is that there are only three ways to traverse 
all the vertices: 





Note that in each case, we exit out the sanne side we 
entered. Each side of each edge gadget is associated 
with a vertex. Assunning sonne arbitrary order to the 



edges incident on a particular vertex, we can link suc- 
cessive gadgets by edges fornning a chain of gadgets. 
Doing this for all vertices in the original graph creates 
n intertwined chains with n entry points and n exits. 




v3 



v4 



Thus we have encoded the information about the initial 
graph. What about k? We set up k additional vertices 
and connect each of these to the n start points and n 
end points of each chain. 



K 




Total size of new graph: GE -\- K vertices and 12^ + 
2kN + 2E edges -^ construction is polynomial in size 
and time. 

We claim this graph has a HC iff G has a VC of size 
k. 

1. Suppose {vi,v2, ...,Vn} is a HC. 

Assume it starts at one of the k selector vertices. It 
must then go through one of the chains of gadgets 
until it reaches a different selector vertex. 

Since the tour is a HC, all gadgets are traversed. 
The k chains correspond to the vertices in the 
cover. 

Note that if both vertices associated with an edge 
are in the cover, the gadget will be traversal in two 
pieces - otherwise one chain suffices. 

To avoid visiting a vertex more than once, each 
chain is associated with a selector vertex. 

2. Now suppose we have a vertex cover of size < k. 

We can always add more vertices to the cover to 
bring it up to size k. 

For each vertex in the cover, start traversing the 
chain. At each entry point to a gadget, check if 
the other vertex is in the cover and traverse the 
gadget accordingly. 

Select the selector edges to complete the circuit. 
Neat, sweet, and NP-complete. 



To show that Longest Path or Hamiltonian Path is NP- 
complete, add start and stop vertices and distinguish 
the first and last selector vertices. 



Start 



k-1 

selector 

vertices 




This has a Hanniltonian path fronn start to stop iff the 
original graph has a vertex cover of size k. 



Give an efHcient greedy algorithm that finds an optimal 
vertex cover of a tree in linear time. 



In a vertex cover we need to have at least one vertex 
for each edge. 

Every tree has at least two leaves, nneaning that there 
is always an edge which is adjacent to a leaf. Which 
vertex can we never go wrong picking? The non-leaf, 
since it is the only one which can also cover other 
edges! 

After trimming off the covered edges, we have a smaller 
tree. We can repeat the process until the tree as or 
1 edges. When the tree consists only of an isolated 
edge, pick either vertex. 

All leaves can be identified and trimmed in 0(n) time 
during a DFS. 



Dealing with ATP-complete 

Problems 



Option 1: Algorithm fast in the 

Average case 

Examples are Branch-and-bound for the Traveling Sales- 
man Problem, backtracking algorithms, etc. 



Option 2: Heuristics 

Heuristics are rules of thumb; fast methods to find a 
solution with no requirement that it be the best one. 

Note that the theory of TVP-completeness does not 
stipulate that it is hard to get close to the answer, 
only that it is hard to get the optimal answer. 

Often, we can prove performance bounds on heuristics, 
that the resulting answer is within C times that of the 
optimal one. 



Approximating Vertex Cover 

As we have seen, finding the nninimum vertex cover is 
TVP-connplete. However, a very sinnple strategy (heuris- 
tic) can get us a cover at most twice that of the opti- 
mal. 

While the graph has edges 

pick an arbitrary edge v,u 

add both u and v to the cover 

delete all edges incident on either u and v 



If the graph is represented by an adjacency list this can 
be implemented in 0(m -j- n) time. 

This heuristic must always produce cover, since an 
edge is only deleted when it is adjacent to a cover 
vertex. 

Further, any cover uses at least half as many vertices 
as the greedy cover. 



Why? Delete all edges from the graph except the edges 
we selected. 



No two of these edges share a vertex. Therefore, any 
cover of just these edges must include one vertex per 
edge, or half the greedy cover! 



Things to Notice 

Although the heuristic is simple, it is not stupid. 
Many other seemingly smarter ones can give a far 
worse performance in the worst case. 

Example: Pick one of the two vertices instead of 
both (after all, the middle edge is already covered) 
The optimal cover is one vertex, the greedy heuris- 
tic is two vertices, while the new/bad heuristic can 
be as bad as n — 1. 




Proving a lower bound on the optimal solution is 
the key to getting an approximation result. 

Making a heuristic more complicated does not nec- 
essarily make it better. It just makes it more dif- 
ficult to analyze. 



A post- processing clean-up step (delete any un- 
ecessessary vertex) can only improve things in prac- 
tice, but might not help the bound. 



The Euclidean Traveling 

Salesman 

In the traditional version of TSP - a saiesnnan wants 
to plan a drive to visit all his customers exactly once 
and get back home. 

Euclidean geometry satisfies the triangle inequality, d(w, w) < 
d{u, v) + d{v, w). 

TSP remains hard even when the distances are Eu- 
clidean distances in the plane. 




Note that the cost of airfares is an example of a dis- 
tance function which violates the triangle inequality. 

However, we can approximate the optimal Euclidean 
TSP tour using minimum spanning trees. 

Claim: the cost of a MST is a lower bound on the 
cost of a TSP tour. 

Why? Deleting any edge from a TSP tour leaves a 
path, which is a tree of weight at least that of the 
MST! If we were allowed to visit cities more than 
once, doing a depth-first traversal of a MST, and then 



walking out the tour specified is at most twice the cost 
of MST. Why? We will be using each edge exactly 
twice. 

1 




Every edge is used exactly twice in the DPS tour: 1 
However, how can we avoid revisiting cities? 



We can take a shortest path to the next unvisited ver- 
tex. The improved tour is 1 — 2 — 3 — 5 — 8 — 9 — 6 — 
4 — 7 — 10 — 11 — 1. Because we replaced a chain of 
edges by the edge, the triangle inequality ensures the 
tour only gets shorter. Thus this is still within twice 
optimal! 



Finding the Optimal Spouse 

1. There are up to n possible candidates we will see 
over our lifetime, one at a time. 

2. We seek to maximize our probability of getting the 
single best possible spouse. 

3. Our assessment of each candidate is relative to 
what we have seen before. 

4. We must decided either to marry or reject each 
candidate as we see them. There is no going back 
once we reject someone. 

5. Each candidate is ranked from 1 to n, and all per- 
mutations are equally likely. 



For example, if the input permutation is 

(4,2,3,5,6,1) 
we see (3,1,2) after three candidates. 

Picking the first or last candidate gives us a probability 
of 1/n of getting the best. 

Since we seek maximize our chances of getting the 
best, it never pays to pick someone who is not the 
best we have seen. 

The optimal strategy is clearly to sample some fraction 
of the candidates, then pick the first one who is better 
than the best we have seen. 

But what is the fraction? 



For a given fraction 1//, what is the probability of 
finding the best? 

Suppose 2 + 1 is the highest ranked person in the first 
n// candidates. We win whenever the best candidate 
occurs before any number from 2 to z in the last n(l — 
1//)// candidates. 

There is a 1/z probability of that, so, 



^ = E 



i=i 



In fact, the optimal is obtained by sampling the first 
n/e candidates. 



Does this really work? Well, it did for me! 



